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Preface 


The Education Amendments of 1978 (P.L. 95-561), which 
reauthorized the major federal elementary and seconds 
school programs/ included the following provisions 


STUDY OF EVALUATION PRACTICES AND PROCEDURES 

SEC. 1526. The Commissioner of Education 
shall conduct a study of evaluation practices 
and procedures at the national/ State/ and 
local levels with respect to federally funded 
elementary and secondary educational programs 
and shall include in the first annual report to 
Congress submitted more than one year after the 
date of enactment of this Act proposals and 
recommendations for the revision of 
modification of any part or all of such 
practices and procedures. Such proposals and 
recommendations shall include provisions— 

(1) to ensure that evaluations are based 
on uniform methods and measurements; 

(2) to ensure the integrity and 
independence of the evaluation process; 
and 

(3) to ensure appropriate follow-up on 


started late in 1979 and corapletec 
the new Department of Education, 1 
OE. 

It was explicit in the request 
core of the study would be a repoi 
committee. The Committee on Prog: 
Education came to life in early 1< 
auspices of the Assembly of Behav 
Sciences. Its membership was sel< 
appropriate disciplines as well a: 
and responsibilities regarding ev« 
of the fact that the problems to ] 
much to the organization, managenu 
evaluation as to questions of eva! 
methodology, and quality. The dis 
the Committee included communicat 
educational administration, educa 
experimental psychology, politica 
psychology, sociology, sociology 
statistics (psychometrics) . The 
included: carrying out large-sca 
evaluations in different settings 
school system, private sector); c 
and managing more general progran 
social research and development ( 
government agencies; serving as s 
congressional education committee 
pertinent research on methodology 
evaluations and on social R&D. < 
conducted general assessments of 
The Committee held three two-< 
working conference to develop th 
report. Richard A. Berk of the i 
Santa Barbara, assisted the Comm 
during the working conference, 
meetings, the Committee focused 
issues to be addressed. Senior 
of Education and from education 
met with the Committee to give u 
views. (See Appendix D for a 13 



addressed: location of evaluation activities within the 
Department, coordination of evaluation within the 
Department, participation in evaluation design and use fc 
program and planning officials, and continuing advisory 
mechanisms for evaluation* Department staff also raisec 
the following nonorganizational issues: distinguishing 
among types of evaluations, planning of evaluations, 
strategic considerations in evaluation management, and 
appropriate utilization. 

Starting from those expressed concerns, the Committe* 
explored other related issues and came to organize the 
report around four major topic areas: distinguishing 
between evaluation types and choosing appropriate 
strategies and procedures? improving the quality of 
evaluations? increasing the effective use of evaluations 
and improving the organization and management of 
federally funded evaluations in education. The 
congressional concern with uniform methods and measures 
was subsumed under the broader topic of evaluation 
strategies and procedures, since consideration of metho< 
and measures is possible only in the context of a 
specific set of policy questions and after an evaluatioi 
strategy and procedure have been determined. 

In carrying out its study, the Committee relied on 
various kinds of information to supplement the members' 
knowledge and experience. Members and staff conducted 
informal interviews with employees and ex-employees of 
OE, of the Department of Education, of other federal R&] 
support agencies, and with congressional staff familiar 
with the provision calling for the assessment of 
evaluation practices. (For a list of persons 
interviewed, see Appendix D.) Two papers were 
commissioned from consultants to supply detailed 
information on the evaluation activities within the 
Department and on the performer communities that carry 
out evaluation studies? they appear as Appendixes A and 
B. A third paper, contributed by Committee member Fred; 
M. Hollev. d id d nsicrht nto evaluation activities , 









of the report that helped improve the final version. VJ 
are grateful, too, to David A. Goslin, executive direct 
of ABASS, for his support and valuable suggestions, to 
Eugenia Grohman, associate director for reports of ABAS 
who critically edited the report, and to Elaine 
McGarraugh, editorial assistant, who supervised its 
production. 

Finally, we wish to thank Rose S. Kaufman, whose 
administrative support early on facilitated the 
organization and first meetings of our Committee, and 
Diane L. Goldman, who ably took over from her as our 
administrative secretary, typed the many versions of t* 
report, and provided us with much needed logistical 
support and technical assistance. 

Peter H. Rossi, Chair 

Committee on Program Evaluation in Education 




Summary 


Evaluation as an established field of applied social 
science research has grown rapidly over the last 20 
years, accompanied by the expectation that the empirical 
knowledge resulting from evaluation studies would impro' 
the process of making decisions about social programs. 
In education, more than $40 million is now spent per ye< 
for evaluation activities by the Department of Educatior 
about $60 million more in federal funds is spent by oth< 
federal agencies and by state and local agencies. But c 
the number of evaluation studies and their sophisticate 
have grown, so has concern that evaluation work has not 
lived up to its potential. In response to such concern* 
on the part of Congress, the Committee on Program 
Evaluation in Education examined four aspects of 
evaluation in education: the varieties of evaluation ar 
their respective roles; the quality of evaluation 
fforts; the use of evaluation results; and the 
rganization and management of evaluation activities. \ 
focused on these topics because they were identified to 
be of greatest interest to the two primary audiences fo] 
ur report: members of Congress and their staffs and 
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operating programs—the traditional focus of 
evaluation—does not answer some important questions? 
research is also needed in planning and implementing 
programs. During the planning phase, there are quest: 
of need and how to meet those needs. Survey and 
ethnographic studies can establish the extent and 
distribution of an educational problem? controlled pi] 
testing and field tests can determine the effectivenes 
and feasibility of alternative interventions for 
relieving the problem? and economic analyses can be us 
to make cost estimates. Once a program is establishec 
and operating, there are questions of fiscal and cove] 
accountability. Analyses of administrative records c« 
determine whether funds are being used properly and 
whether the program is reaching the intended 
beneficiaries, although supplementary fiscal audits ai 
beneficiary studies are sometimes required. Finding c 
whether the program is being implemented appropriately 
requires, in addition to program administrative recorc 
special surveys of program services and ethnographic 
studies. Finally, there are questions of program impc 
they can be addressed definitively only through rigorc 
and often costly research methods. Consequently impac 
evaluation should be undertaken only if the requisite 
skills and resources are available. 

Not all programs can be fully evaluated: that is, 
all questions can be answered for all programs. In 
particular, meaningful impact evaluation is possible c 
for programs for which intended beneficiaries and effe 
can be clearly specified. There are two kinds of 
programs for which such specification is extremely 
difficult or impossible. For a program having vague 
goals or many diverse goals, evaluators and those who 
commission an evaluation must be able to agree on whic 
goal should be assessed and whether appropriate measui 
are available to assess it. For a program in which lc 
sites are given autonomy to develop their own specific 
obiectives and means of reachina them, one cannot 
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inwilling to compete under such conditions. The lack of 
liversity among evaluation contractors reduces the 
>ossibility of new ideas entering the evaluation system 
tnd thereby improving it. Perspectives of beneficiary 
>opulations, in particular/ are underrepresented on both 
;he sponsor and the performer sides. 

Flexibility in evaluation/ which could contribute to 
[uality/ has also been reduced because of emphasis in t Y 
>ast on large studies. The restrictions on creativity 
.mposed by this approach are aggravated when a single 
.ndividual or small group within the Department develops 
:he main procurement instrument/ as is usually the case, 
in additional constraint on flexibility and creativity i 
:he current monitoring process/ which makes it difficult 
:o adjust the course of a study because of changed field 
:onditions or because a different research direction is 
rarranted. 

A third explanation for problems of quality is that 
:he intellectual marketplace for appraisal and scrutiny 
>f evaluations has yet to be fully formed. Generally/ 
:here is no review by outside experts during the 
>rocurement phase when the main elements of a study are 
>eing designed; the lack of diversity among competitors 
lor evaluation work further inhibits opportunities for 
:he marketplace to operate? and/ upon completion of a 
jtudy, external review of final reports happens only 
iporadically. Institutional mechanisms for encouraging 
tmple discussion by experts and parties at interest of 
>lans for and findings of major studies are spotty at th 
lederal level? they are largely absent at the state and 
.ocal levels. 


Using the Results of Evaluation 

l frequently voiced criticism of evaluation is that 
‘valuation findings are seldom used. Implicit in this 


also be considered and not used because 
inappropriate or because the indicated c 
policy are infeasible. Moreover, even * 
immediately discernible use of knowledge 
evaluations, it cumulates over time and 
absorbed, eventually leading to changes 
decision perspectives. 

There are important limits to the use 
results in the short run. Social proble 
ought to be a political process; the fox 
impinging on decisions about programs ar 
powerful than empirically derived evider 
environment in which decisions are made 
swift and unilateral action; new informs 
slow down the process, since it may make 
complicated. For these reasons, while € 
sponsors should do their best to dissemi 
findings, they cannot ensure utilizatior 

Dissemination can be improved in a nu 
however. At the very least, evaluation 
communicated to the primary audience. C 
must be available; primary data should 1: 
reanalysis. Unfortunately, none of thes 
dissemination steps is now routine. Ass 
information is made available, other imp 
affecting its use include whether it is 
objective and whether it is structured a 
way that is relevant to potential users, 
also important, particularly when direct 
specific decisions is intended. 

Because evaluation results are more 1 
when they address issues of importance t 
audiences, concern with the use of evalu 
cannot begin when final reports are reac 
disseminated. The primary audience and 
needs of a given evaluation should be ic 
inception of the study. Such initial id 
help define the type of evaluation to be 
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Organizing and Managing Evaluation Activi 

The Department of Education has accountability a 
oversight responsibilities with regard to federa 
education programs and must carry out evaluation 
activities that address those responsibilities. 
Department should also develop knowledge about e 
that can be used to improve both their managemen 
their contribution to more effective education, 
the Department should be able to formulate new p 
based on tested alternatives that speak to unmet 
education. 

At present, evaluation responsibilities are a 
to several different units within the Department 
state and local agencies. Fiscal audits and 
investigations on compliance with civil rights 1 
appropriately carried out by offices created spe 
for these functions. Similarly, local and state 
are appropriately responsible for supplying fisc 
beneficiary information needed to administer fed 
programs. However, the assignment of other type 
evaluation responsibilities among levels of gove 
and within the Department varies remarkably fron 
to program, despite the existence of a central e 
unit. 

Though some decentralization of activities is 
appropriate, assignment of responsibilities shou 
a more systematic and purposeful basis. The Con 
suggests the following guidelines: 

• Collection of information on beneficiaries 
and on allocation of resources should continue t 
requirement for state and local agencies. When 
do not have adequate capability for accurate rej 
technical assistance ought to be provided. An i 
caveat is that reporting requirements should not 
more information than can be digested at the le^j 

/ floral nr cfat-ol ronoini nn f ha rannrfc Mn r c 
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evaluations, it cumulates over time and 
absorbed, eventually leading to changes 
decision perspectives. 

There are important limits to the use 
results in the short run. Social proble 
ought to be a political process; the for 
impinging on decisions about programs ar 
powerful than empirically derived eviden 
environment in which decisions are made 
swift and unilateral action? new informa 
slow down the process, since it may make 
complicated. For these reasons, while e 
sponsors should do their best to dissemi 
findings, they cannot ensure utilization 

Dissemination can be improved in a nu 
however. At the very least, evaluation 
communicated to the primary audience. C 
must be available? primary data should b 
reanalysis. Unfortunately, none of thes 
dissemination steps is now routine. Ass 
information is made available, other imp 
affecting its use include whether it is 
objective and whether it is structured a 
way that is relevant to potential users, 
also important, particularly when direct 
specific decisions is intended. 

Because evaluation results are more 1 
when they address issues of importance t 
audiences, concern with the use of evalu 
cannot begin when final reports are read 
disseminated. The primary audience and 
needs of a given evaluation should be id 
inception of the study. Such initial id 
help define the type of evaluation to be 
issues to be addressed, the sort of info 
collected, and the form of reporting and 
that is likely to be most effective. Th 
evaluation reports is often a barrier tc 
must be intelligible to the intended aud 
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xae uumiiiittee nas rwo sets or recommendations, one ror 
Congress and one for the Department. The recommendati 
are presented and the discussion of them summarized in 
the following two sections; the chapter numbers in 
parentheses indicate where the more detailed discussic 
are found. 


Recommendations to Congress 

The first recommendation to Congress is concerned with 
obtaining a better match between the information that 
results from evaluation studies and the information th 
is useful in making decisions about programs. The nea 
three recommendations/ C-2, C-3, and C-4/ are intended 
improve oversight and accountability for evaluations 
carried out with funds from federal education programs 
The last recommendation to Congress addresses manageme 
constraints external to the Department. 


Recommendation C-l. When Congress requests evaluation 
it should identify the kind of question(s) to be 
addressed . (Chapter 2) 

Given the diversity of evaluation activities 
misunderstandings about what information is needed hav 
frequently arisen between Congress and the Department 
its evaluation contractors. Congress should attempt t 
make more explicit whether it needs information about 
program services, about program coverage, about progra 
impact, or about other program aspects. Such clarity 
will make it more likely that useful information will 
delivered as a result of an evaluation effort. The 
primary audience(s) for the results of the requested 
evaluations should also be identified, since different 
audiences need different types of information. 

Clarity of congressional intent can be brought abou 
in two ways. When specificity about questions and 
audiences is not possible ahead of time, evaluation st 
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Organizing and Managing Evaluation Activities 

Che Department of Education has accountability and 
>versight responsibilities with regard to federal 
jducation programs and must carry out evaluation 
ictivities that address those responsibilities. The 
)epartment should also develop knowledge about programs 
:hat can be used to improve both their management and 
:heir contribution to more effective education. Finally 
;he Department should be able to formulate new programs 
>ased on tested alternatives that speak to unmet needs i 
iducation. 

At present, evaluation responsibilities are assigned 
;o several different units within the Department, and to 
state and local agencies. Fiscal audits and 
.nvestigations on compliance with civil rights laws are 
appropriately carried out by offices created specificall 
:or these functions. Similarly, local and state agencie 
ire appropriately responsible for supplying fiscal and 
>eneficiary information needed to administer federal 
programs. However, the assignment of other types of 
^valuation responsibilities among levels of government 
ind within the Department varies remarkably from program 
:o program, despite the existence of a central evaluatio 
mit. 

Though some decentralization of activities is 
appropriate, assignment of responsibilities should be on 
i more systematic and purposeful basis. The Committee 
suggests the following guidelines: 

• Collection of information on beneficiaries served 
and on allocation of resources should continue to be a 
requirement for state and local agencies. When agencies 
io not have adequate capability for accurate reporting, 
:echnical assistance ought to be provided. An important 
caveat is that reporting requirements should not generat 
nore information than can be digested at the level 
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RECOMMENDATIONS 

he Committee has two sets of recommendations, one for 
ongress and one for the Department, The recommendation 
re presented and the discussion of them summarized in 
he following two sections? the chapter numbers in 
arentheses indicate where the more detailed discussions 
re found. 


Recommendations to Congress 

he first recommendation to Congress is concerned with 
btaining a better match between the information that 
esults from evaluation studies and the information that 
s useful in making decisions about programs. The next 
hree recommendations, C-2, C-3, and C-4, are intended t 
mprove oversight and accountability for evaluations 
arried out with funds from federal education programs, 
'he last recommendation to Congress addresses management 
onstraints external to the Department. 


Recommendation C-l. When Congress requests evaluations, 
t should identify the kind of question(s) to be 
iddressed. (Chapter 2) 

Given the diversity of evaluation activities 
lisunderstandings about what information is needed have 
frequently arisen between Congress and the Department ar 
ts evaluation contractors. Congress should attempt to 
lake more explicit whether it needs information about 
>rogram services, about program coverage, about program 
mpact, or about other program aspects. Such clarity 
ill make it more likely that useful information will be 
lelivered as a result of an evaluation effort. The 
rimary audience (s) for the results of the requested 
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;he National Institute of Education, for school systems 
ind states that would provide for funding a few of the 
aost technically promising proposals for impact 
assessments of local programs or for program improvemen 
aased on evaluation of alternative program strategies. 


Recommendation 04. Congress should require an annual 
report from the Department of Education on all evaluate 
expenditures and activities . (Chapter 3) 

The annual evaluation report currently required from 
:he Department should be expanded to cover all federall; 
funded evaluation activities in education, including ali 
5f those in the Department as well as those carried out 
>y state and local agencies. Expenditures at all level! 
should be specified? activities, findings, and their us< 
should be briefly described. 


Recommendation C-5. Congress should authorize a study 
jroup to analyze the combined effects of the legislative 
provisions and executive regulations that control 
federally funded applied research . (Chapter 5) 

One of the causes of the lack of timeliness and 
relevance of evaluation studies is the accumulation of 
rules and regulations governing the whole process of 
funding and carrying out applied research in the social 
service area. While almost every provision now on the 
cooks or enforced through executive practice is there t 
provide some safeguard and may be reasonable when 
considered in isolation, in the aggregate they have 
legative effects. The trade-offs between the benefits < 
the safeguards and the obstacles they create against 
producing timely and relevant applied research at 
reasonable cost deserve careful scrutiny. Simplificati 
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development of.the evaluation field, the 
budget, or agency personnel ceilings. T1 
recommendations on procedures are organi 
intended to develop better strategies fo 
evaluation planning within the Departmen 
planning individual studies; those inten 
the quality of evaluations, including th 
and technical assistance; and those inte 
facilitate use. The last three recommen 
improvements needed in general managemen 


On Evaluation Strategy 

Recommendation D-l. In evaluations init 
Department of Education, the kinds of ex 
activities to be carried out should be s 
and should be justified in terms of proc 
or program implementation . (Chapter 2) 

This recommendation is analogous to I 
to Congress. It emphasizes the need to 
what type of evaluation activity is appi 
given stage of planning or implementatic 
program or an existing program. For ex< 
Department officials need to specify wh< 
know about a program, why they wish to 1 
specified time, and what audiences othe. 
have information needs that must be sat: 
evaluation activities. All these needs 
coordinated with legislated requests fo 
(See also Recommendation D-10 on planni 


Recommendation D-2. When pilot tests o 
programs are conducted, pilot tests of 
requirements should be conducted simult 
determine their feasibility and appropr 
(Chapter 2) 
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enefits, how programs should account for and measure 
osts, which testing instruments and procedures are 
isruptive and which are not, how large a sample of 
eneficiaries is needed to get valid program 
easurements, and so forth. If a pilot test of an 
valuation were carried out in conjunction with the pile 
est of a program, the design of both the program and of 
he evaluation requirements would be strengthened. 


ecommendation D-3. The National Institute of Education 
hould continue and strengthen its program of support fo 
esearch in evaluation methods and processes . (Chapter 

The advances made in the technical aspects of 
valuation have been considerable, but uneven. The 
ommittee believes that too much attention has been give 
o investigating problems in the use of randomized 
ontrolled experiments. Other important problems in 
lethodology have not received sufficient attention, for 
xample, methods for studying the delivery of services, 
or investigating the properties of achievement tests 
hen used in the evaluation of programs, and for 
ssessing the impact of programs that cannot be studied 
hrough the usual experimental paradigms. Another 
leglected area of research is the process of evaluation 
tselfs how studies are commissioned and initiated, how 
.hey are managed, what laws and procedures impinge upon 
hem. The Committee's work indicates that current 
>rocedures constrain the quality and the use of 
valuations, but how these processes operate is poorly 
mderstood; therefore, it is difficult to design 
ffective remedies. 


>n Quality, Training, and Technical Assistance 
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to be hslpsd by fadaral education proyic 
A primary training need concerns the 
underrepresentation of minority group me 
educational evaluation enterprise. Well 
education programs target minority group 
recipients of services. The Committee b 
quality of evaluation would be improved 
of minority persons who are also well tr 
technically. For example, intimate pers 
the circumstances of beneficiaries will 
outcome measures that are more relevant 
and more closely related to improving th 
of programs. Hence, we believe that sue 
should be represented to the fullest ext 
the evaluation of such programs • Fellow 
internship programs in evaluation that i] 
priorities for minority group persons wo 
valuable; they would produce good resear* 
would enrich the evaluation system. 

A second concern related to training 
relationship between the evaluator and t 
or educator. The communication gap betw< 
inhibits the use of evaluation may be na 
appropriate training on both sides. Exe< 
program staff would benefit from greater 
language of evaluation and how evaluatioi 
evaluators need exposure to the problems 
constraints of federal education program' 
also need to improve interpersonal and c 
skills in order to convey evaluation inf< 
effectively. 

Technical training for evaluation sta 
necessary, both within the federal qover 
state and local levels. There have neve 
numbers of staff trained in either rigor 
methods or in research, and there have h 
developments in the field. Evaluation i 
practiced by those from almost every tvr 
possible, including many with no more nr 
that of classroom teaching. Practicing 
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Recommendation D-5. The Department of Education shou! 
structure the procurement and funding procedures for 
evaluations so as to permit more creative evaluation \ 
bv opening up the process and allowing a period for 
exploratory research . (Chapter 3) 

The more complex the evaluation, the less likely i 
that one can spell out ahead of time the best methods 
addressing the questions that the evaluation is desig: 
to answer. The current RFP process in particular ign< 
this fact. The Committee believes that the RFP proce: 
can be made more flexible. RFPs for large studies she 
include a period of exploratory research? they should 
also provide for side studies that address questions 
integral to the evaluation that emerge after it is urn 
way. Proposers should be given the freedom to specif: 
alternative methods and to suggest side studies. Mos 
important, sufficient time for developing proposals mi 
be allowed. 

Mechanisms other than RFPs for funding evaluations 
also be used to open up the system. For example, 
unsolicited and solicited proposals, 8-A contracting, 
cooperative agreements, basic ordering agreements, an< 
grant awards are each appropriate to given evaluation 
tasks. The Committee's recommendation that a greater 
variety of funding methods be employed does not imply 
that the use of RFPs be drastically reduced. Flexibii 
in the award process, we believe, will permit the 
introduction of new ideas that may contribute to 
higher-quality evaluations. Flexibility will also al. 
greater participation by minority organizations and 
researchers. 


Recommendation D-6. All major national evaluations 
should be reviewed by independent groups at the desig 
award, and final report stages. Review groups should 
include representatives of minorities and other consul 






When possible, ethnographic data and 
material, similarly treated to protect 
confidentiality, should also be made av 
Making primary data from evaluation! 
require support in major evaluation 
documentation, storage, and^Us®”^ 
creation of explicit agency poli™ * 
Since the objective i/to^SS Z * 
of the methods and findings of ma-w ' 
independent review and reanalysis^LT! 
the Department as part of its evali,*^ 3 
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for state and local evaluation needs » (Chapter 


The technical assistance needs of state and 1 
agencies are not uniform. They vary with the si 
agency, the sophistication of the agency’s evalu 
staff, and with the complexity of the federal pr 
activity in the agency. The technical assistanc 
associated with Title I are one approach to meet 
needs. Another approach would be.to identify or 
exemplary models of monitoring and reporting and 
disseminate the procedures involved. A third ap 
would be to develop the capability of state agen 
provide technical assistance to less sophisticat 
agencies. 

Technical assistance should also cover organi 
and personnel issues. In particular, state and 
agencies need to be aware of the desirability of 
separating an evaluation unit from program admin 
in order to avoid conflicts of interest. Work a 
done by some state and local agencies on optimal 
institutional arrangements, personnel reguiremer 
procurement policies for extramural work can for 
basis of advice and assistance to others. (See 
Recommendation D-16 on minimum requirements for 
monitoring and compliance reporting.) 


On Utilization 

Recommendation D-9. The Department of Education 
test various mechanisms for providing linkage be 
evaluators and potential users . (Chapter 4) 

The Department should consider establishing a 
charged with studying, developing, and instituti 
knowledge transfer mechanisms and evaluating the 
effectiveness. Alternatively, outside experts m 
charged with this responsibility. Appropriate a 
would include assessing proposed dissemination p 
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A workable planning system must p 
appropriate information to be avails 
legislative decision cycles on educa 
must accommodate an ongoing program 
addressing problems that are poorly 
must be sufficiently flexible to all 
interesting but unanticipated guesti 
result of ongoing research, changes 
development of new programs. The ev 
major education program should contc 
studies, some of which furnish factu 
reasonably short time and some of wl: 
long-term interest. 

Although planning does not necess 
agenda that is subsequently carried 
planning almost always leads to an : 
priorities, provides a forum in whic 
can reach accommodations, and induci 
opposed to a reactive stance toward 


Recommendation D-ll. The Departmen 
establish a quick-response capabili 
but unanticipated evaluation guesti 

In order to be fully responsive 
needs of its primary audiences, the 
able to combine a deliberative plan 
allows time for field and constitue 
quick-response capability that can 
but critical evaluation questions < 
Department staff charged with evalv 
should be able to respond within 2 
evaluation-related questions to wh: 
top-level Department officials see 
Several extramural mechanisms are 
purpose, for example, maintaining 
contractors who can be given speci 
short notice or using 8-A contract 
SBA-eligible firms. 




In order to increase the relevance of evaluat 
results , primary audience(s) must be specified p 
the beginning of a study. When conditions chang 
the course of a study that might affect the usab 
the findings, study objectives and design should 
reconsidered to ensure that the study will remai 
relevant. Efforts should be made to deliver rep 
time, especially when study results are intended 
decisions that are made at specified times. 


Recommendation D-13. The Department of Educatio 
ensure that dissemination of evaluation results 
adequate coverage . , (Chapter 4) 

All RFPs and grant announcements should inclu 
requirements for a dissemination plan oriented t 
utilization, and proposal evaluation should give 
appropriate weight to the quality of the propose 
dissemination plan. Dissemination plans should 
specification of audiences and their information 
strategies for reaching the audiences, provision 
adequate number of report copies and other mater 
mechanisms for adapting the dissemination plan a 
study proceeds. Budget negotiations should reco 
that adequate dissemination is costly and cannot 
afterthought. 


Recommendation D~14. The Department of Educatio 
observe the rights of any parties at interest an 
public in general to information generated about 
programs . (Chapter 4) 

Findings from evaluations must be made availa 
those who are importantly affected by the progra: 
evaluated, including those who manage them, thos 
provide program services, and those who are inte 
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necessary flexibility with regard to di 
reports and other dissemination strateg 


Recommendation D-15. The Department of 
give attention to the identification of 
user audiences and develop strategies t 
information needs . (Chapter 4) 

Perhaps the most neglected audience 
studies consists of program beneficiari 
representatives. We believe that this 
much intentional as it is produced by t 
difficulties of defining this set of ai 
reasonable way. In order to more close 
ideal that all those having a recogniz< 
program should have reasonable access 1 
results, the Department should consider 
evaluation reports freely to groups an< 
that claim to represent major classes < 
education programs. Positive, active < 
such right-to-know groups may include 
activities as ascertaining their infon 
to evaluation design and during the ev 
standard lists of groups and organizat 
evaluation results are routinely disse 
seeking out comments and critiques of 
reports. Since it is to be expected t 
right-to-know groups will be different 
evaluations, careful consideration of 
right-to-know groups should be part of 
plans that contractors are asked to pi 
their response to RFPs and grant annor 







21 


On General Management 

Recommendation D-16. The Department of Education shouL 
clearly spell out minimum requirements for monitoring ai 
compliance reporting and set standards for meeting the 
requirements . (Chapter 5) 

Such data items as distribution of funds, number and 
types of beneficiaries being served, and specific progr< 
services should be defined by the Department so that 
Local and state agencies will know exactly what reportii 
is required of them. Quality control procedures should 
ce enforced so that adequate performance reports can be 
nade to Congress. Before setting the requirements, 
lowever, the Department needs to examine its own capacit 
to deal with local and state reports in order to avoid 
collecting information that is never used because of the 
sheer inability of federal staff to deal with the volume 
of reports. The objective of this recommendation is to 
Improve the quality of data needed for accountability 
without increasing the burden of response on local and 
state agencies. To accomplish both ends, admittedly 
somewhat difficult to reconcile, the Department should 
consider appropriate development research on what kinds 
of procedures would minimize response burden and at the 
same time ensure sufficient data quality. 


lecommendation D~17. The Department of Education shoulc 
examine staff deployment and should establish training 
opportunities for federal staff responsible for 
evaluation activities or for implementation of evaluatic 
bindings . (Chapter 5) 

The Department should consider alternative ways of 
ising the technical staff within the central unit and tl 
evaluation staff in other units. The greater the degree 

if aovprnmpnt involvpmAnt- in an act-ivi t-v. thp arpatpr tV 







auwuxu nave practical program knowledge 
training programs should be made availc 
staff members adequately for their tasl 


Recommendation D—18. The Department of 
take steps to simplify procedures for t 
evaluation studies, carrying them out, 
their findings. (Chapter 5) 

The Committee is aware that our recc 
opening up the system and for involving 
and other parties at interest during va 
complicate and prolong the evaluation p 
we firmly believe that this can be more 
for by simplifying and improving intern 
procedures now used by the Department. 

The procurement process has become n. 
restrictive and inflexible but very cos 
staff time and to proposers, though the 
is recouped eventually through overhead 
ways, so that the government bears the < 
Other sources of delay, once a contract 
study has been awarded, must also be id. 
addressed. This applies particularly t 
procedures and to monitor and agency hai 
for changes in study design, sampling p 
testing, analysis, time frame, and the 
Department should consider sanctions an 
encourage timely performance, and it sh 
responsible for timely dissemination 
Our call for timely performance on s 
intended to feed into a specific legisl 
management decision in no way invalidat 
more deliberative approach in certain c 
times, especially when an effort is bei 
a problem that is little understood, wh 
important to promote a variety of studi 
emerging leads than to mount a formal E 
provide a definitive answer by a specif 
such cases, however. , _ 




Introduction 


BACKGROUND 

Cn the broadest sense, evaluation has always been done. 

[n its more narrow modern usage, "evaluation" has come tc 
nean the use of recently developed research tools and 
concepts of the social sciences to develop evaluation 
knowledge. What has social-science-based evaluation 
contributed to education? Two examples/ one of national 
scope/ the other local/ illustrate how such evaluations 
Illuminate and sometimes contradict judgments derived in 
Dther ways; they thus increase knowledge about what 
iffects the educational process and how it in turn may 
affect educational and social goals. 

In 1959 James B. Conant published his widely read 
report on the American high school/ recommending/ among 
ather things/ the consolidation of small high schools 
into large comprehensive schools and an increased 
amphasis on English composition/ mathematics/ and 
science. His report, based on visits to several dozen 
ligh schools, was essentially the application of his 
judgment as an experienced educator to what he saw as 
hvninal nractice in bette ools in comparison with 
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of social science %r ® d ° n the c 
schools even though therTwas llZed What 
schools he studied fairw n ° 6Vlden 
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showed that characteristics of schools, teachers, or 
principals counted very little in comparison with famil 
background. Indeed, the major difference between schoo. 
was accounted for by the differences in the mixes of 
students from various backgrounds, with school facility 
nd financial expenditures also counting for very 
little. This finding profoundly shocked the field of 
education. The main policy implication of the finding 
was that changing the academic achievement of children 
through changing the schools was not going to be an eas; 
job entailing merely changes in curricula, upgrading of 
teachers, or providing more financial support to the 
schools. 

The importance of testing alternative explanations i 
shown as dramatically in a recent study (Robertson 1980 
of the effect of dropping driver education from the 
curricula of some Connecticut high schools. In 1976, tl 
Connecticut state legislature decided to discontinue 
subsidizing driver education in the state's high 
schools. In response, some of the high schools dropped 
driver education entirely from the curriculum while som< 
retained it, financing the classes from local funds. 
Robertson tested the impact of this change on automobili 
accidents involving young persons aged 16 and 17 by 
comparing the number of accidents in counties in which 
driver education was retained with counties in which it 
had been dropped. He noted that over a 2-year period, 
he number of accidents involving persons aged 16 and 1 
declined drastically in the communities that had dropped 
the course. 

It would have been easy to conclude that driver 
education was not efficacious in training careful 
drivers, or even that it produced more reckless drivers 
out Robertson tested a number of reasonable alternative 
explanations. The most plausible of these alternatives 
was indicated by a drop in the number of drivers aged 1< 
and 17 in those communities that dropped driver 
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such programs has been to alleviate a i 
societal problems, from unemployment t 
scores of some children in public scho< 
substandard housing to recidivism of f 
addiction to the inadequacies and ineq 
health care system. But as a number o 
failed to live up to the expectations 
their creation, even as their costs es 
were raised as to the reasons for the 
performance. In response, federal age 
sponsored and conducted a diversity of 
activities, obligating nearly a quarte 
dollars for that purpose in fiscal 197 
more than 2,000 staff years on the par 
federal evaluation staff (Office of Ma 
1977). 

Nowhere has the growth of programs 
growth of evaluation been more pronoun 
field of education. The federal part 
income grew from 4.3 percent in 1962 t 
1974, from $1.6 billion to $6.6 billic 
1977-78 dollars). The most rapid incr 
mid-1960s? by 1966 the federal contrik 
percent, close to the current level (E 
1979). The increase was largely the i 
landmark Elementary and Secondary Educ 
1965 (reauthorized and added to sever* 
recently in 1978) , which mandated a m 
funded programs to improve the school 
disadvantaged children. Title I, whic 
compensatory education for poor child] 
continues to be, the keystone program 
legislation. To date, more than $26 ] 
funds has gone to state agencies and : 
under Title I (Kirst and Jung 1980). 

Evaluation activities lagged a few 
though the first legislative requirem 
was built into the original Title I 1 
time the program was 7 years old, mor 
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nd local levels. The objectives of the evaluations hav 
•een to establish whether programs are in conformance 
ith legislative provisions, whether programs are manage 
iffectively, and whether programs are achieving the 
esired goals. It was assumed that evaluation would 
nswer those questions and, moreover, provide informatic 
hat could be used to remedy identified deficiencies. 

But achieving evaluations that yield answers has been 
s elusive as achieving successful programs. Early 
valuations faced technical problems and failed to 
nticipate the highly politicized context that surrounde 
he programs being evaluated. As evaluators learned to 
ope with some of the early problems, more evaluations 
'ere funded, and in 1970 the Office of Education (OE) 
stablished a central evaluation unit (see Appendix A) 
ind placed at its head an evaluator of some stature. Bu 
riticism has not abated. Those who sponsor evaluations 
>r are in a position to use them continue to voice their 
isappointment, often finding results irrelevant or not 
lelivered in time for making decisions on programs. 

Because of the theoretical and technical problems and 
ecause of questions on its contribution to formulating 
ocial policy, the field of evaluation has been marked b 
i considerable amount of self-inspection. A large numbe 
f studies and books have been devoted to analyzing 
(valuation, gauging its effectiveness with respect to 
taking policy decisions, developing improved methodology 
ind appraising the quality of individual studies. For 
xample, a recent review of program evaluations (Boruch 
ind Cordray 1980) cites more than 150 references devoted 
o critiques and analyses of individual studies or of th 
:ield in general? another recent comprehensive overview 
Cronbach et al. 1980) cites nearly 200 such references, 
ind both these works concentrate largely on the field of 
valuation in education. 

Many of the published articles and books include 
ecommendations for improving evaluations and making the 
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audiences tor its report, we luentiriei 
audiences as members of Congress and th 
senior executives within the new Departi 
for two reasons. First, these two grou 
specific complaints about the effective] 
evaluation and had asked for recommenda 
improvement. Second, most of the liter 
the field of evaluation is addressed to 
practitioners, rather than to the spons* 
users of evaluations. In the Committee 
critical self-inspection that has chara 
evaluation field has been a mainspring 
of this rather young branch of applied 
While such criticism must continue to p 
to deficient theory and practice (and t 
must speak to its own specialist audien 
continue to miss the mark for those out 
"experts”—the very individuals and gro 
decisions about social programs and who 
to commission and use evaluations. Thi 
primarily addressed to them, and our re 
for the legislators and the agency exec 
obtain greater effectiveness and use fr 
program evaluation in education. 

In addition to our main audiences, w 
report will also be of interest to sevs 
audiences. One such audience includes 
education authorities, who carry out e\ 
activities with federal education funds 
instances, our recommendations concern 
even when this is not the case, they he 
evaluations are commissioned and carri< 
federal level because the programs beir 
the responsibility of state and local < 
concerned with assuring that federal e< 
meet the goals intended by the legisla 
audience. An improved evaluation syst< 
information to carry out their oversig 
effectively. In particular, such info 
to groups interested in furthering equ 
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audience for our recommendations since we intend those 
recommendations to have an impact on how evaluation is 
done and used. 


SCOPE OF THE REPORT 

kmong researchers, the term "program evaluation" 
traditionally has been applied to the assessment of the 
impact of a given program. Generally, this has include* 
answering two kinds of questions: To what degree have 
the changes intended by the program been achieved? To 
what extent can the observed changes be attributed to t 
program? Early in the Committee's proceedings, however 
it became clear that this definition was too limited fo 
cur task and for the audiences of this report. In the 
pragmatic environment in which questions are framed abo 
federal education programs, distinctions between outcome 
evaluations—those concerned with the above 
questions—and other types of assessment are frequently 
irrelevant. Congress and Department officials need to 
know how funds are allocated, what kinds of program 
services are being delivered to whom, how management of 
program could be improved, what program alternatives ar< 
most effective, and which programs are most 
cost-efficient. In developing new programs or changing 
existing ones, questions must be answered about the 
nature and extent of the need to be met and about the 
effectiveness of proposed programs to meet that need, 
considerable proportion of the funds allocated to 
evaluation of federal education programs goes to answer 
such questions, and even studies concerned mainly with 
program outcome include activities (and money) devoted 
those other issues. From discussions with congressiona 
and Departmental staff, it was evident that the 
dissatisfaction with evaluation encompasses perceived 
shortcomings in all areas and that focusing only on 


Immm 






31 


being evaluated? and a number of technical matters 
relating to effective collection of data and appropriate 
analytical strategies. Deemphasis of such topics was n< 
just a matter of lack of time? it reflects the 
Committee's view that those topics are less important t< 
our main audiences and that (particularly in the case o: 
technical issues) the Committee would find little new t< 
add to the extensive literature in the field. 

Four additional issues pervaded the discussions of tt 
Committee, though they had not been identified 
specifically in the 1978 legislative provision calling 
for the assessment of OE's evaluation activities, by 
legislative staff interviewed, or by Department 
officials. Of these, the most important surfaced durinc 
the very first meeting, namely, how well evaluation 
activities address the broad federal mission of equal 
educational opportunity. To do so effectively requires 
the active participation in the whole evaluation process 
f minorities and other groups intended to benefit from 
federal education programs—from the planning and desigi 
of evaluations to their ultimate use. The inadequate 
consideration of the needs and viewpoints of the groups 
intended to benefit from programs affects the kinds of 
questions asked about programs, and insufficient 
information about the results of evaluations prevents 
such groups from knowing how to make programs more 
effective. 

The second issue developed as the Committee pursued 
its questions about the current process of commissioninc 
and carrying out evaluations in education. As a result 
of external regulations and constraints and internal 
procedures, the process operates so as to limit severely 
the flow of ideas and creativity that must be part of ar 
effective research effort, including applied research 
such as program evaluation. The conditions that have le 
to this undesirable state admit of no easy remedy, but 
easures must be taken to open up the process if good 











































funds that are allocated to e 
effectively and yield more us 
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Defining Evaluation 


THE ROLE OF EVALUATION 

The literal meaning of the verb "to evaluate" is to 
estimate the value of some object or activity. As 
applied to education programs, evaluation includes the 
set of activities that are aimed at finding out how 
valuable a program may be. Relevant questions include 
How serious is the condition that the program is desic 
to ameliorate? How is the program supposed to work? 
What would happen without the program? What would haj 
if the program were expanded? How valuable is the 
program compared to other programs? 

Putting things this way makes it very difficult to 
question the value of evaluation. How can one be for 
knowing the value of a program, its impact on this or 
that, or what would happen if it were altered? How Cc 
one favor making budgetary decisions in the absence oi 
evaluation information of some sort? In short, how ce 
one opt for ignorance over knowledge? 

Although the need to know seems indisputable, 
controversy and struggles inevitably arise whenever 
social-science-based evaluations are done and reportec 
First, such evaluations make program goals explicit ai 
thus may uncover previously hidden value disagreements 
Second, they have to compete with other forms of 
evaluation—ad hoc opinions, skillful journalistic 
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methods used, -economic prioriti 
A special difficulty for eve 
scarcely anyone likes to be jud 
operate programs or benefit by 
to react defensively to such ji 
results of the evaluation may t 
is difficult to tolerate* Thei 
that one's behaviors, attitudes 
misinterpreted and distorted ir 
that is incomprehensible or pr< 
one's individual identity. But 
concern that one will be misunc 
is the recognition that evaluat 
some particular point of view < 
positions. By their very natui 
neutral. Judgments are made b< 
explicit assumptions about whal 
should be. To those running a 
it, evaluators' judgments are < 
to the program and hence inapp 
It is obviously important tl 
undertaken by persons who are 
involved with the program bein 
special interests and deep con 
blind them from seeing the pro 
weaknesses. But it is also tr 
dispassion of an external obse 
lead to objectivity. Distance 
lead to disengagement from wha 
identification with and empatt 
program services and those whc 
worse, an alienation from and 
objectives and values held by 
balance precariously between < 
knowledge of the program and < 
permit them to see its streng 
The evaluation process is 
having many diverse audiences 
about the impact and effects , 
evaluated. Each audience 
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For education programs. Congress and the Department o 
Education constitute two highly visible and crucial 
ludiences. They are crucial for two reasons; first, 
:hey can make the decisions about which program to 
.nitiate or to expand, which to discontinue or to 
contract; second, they fund evaluations. Although the 
scope and responsibilities of the Congress and the 
)epartment of Education are clearly the broadest, they 
ire not the only audiences to whom evaluators of 
iducation programs must address their findings. Program 
lecisions about education in the United States (even of 
rederally supported education programs) are only partly 
lade at the federal level; thousands of school boards i 
.ocal communities make most of the school policy that 
iffects the specific character of public education. 

State education agencies (SEAs) also affect what is 
raught and how it is taught in each of the 50 states, 
rhese local and state school authorities may be able to 
ise information provided by evaluations if the findings 
ire presented in ways that are relevant and 
mderstandable. Indeed, not enough careful thought and 
attention has been given to the problem of how such 
.nformation can be provided in the most understandable 
md relevant ways. 

Perhaps the greatest impact of evaluations is on thos 
jho manage education programs and those who provide the 
services of the programs. They are the people whose wor 
ls being judged. These audiences have the most direct 
Involvement in the programs, are most likely to be 
:hreatened by the evaluation process, and may be very 
fearful that programs will be curtailed or cut off 
>ecause of an evaluation's findings. Program personnel 
ire, understandably, usually more concerned with the 
>rotection of their own programs and projects than they 
ire with the advancement of knowledge. Their political 
>ower can be and has been exercised to save a program 
:hat appears to be threatened (for example. Head Start, 
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nd has a different conception of usable knowledge. Thi 
report argues that all of these audiences are important, 
>ut that any particular evaluation usually should not tr 
:o be responsive to all of them. Responding to the 
riyriad and often conflicting expectations of all the 
mdiences is likely to diminish the integrity of an 
jvaluaton and limit its usefulness to any one audience. 

The "primary" audience(s) of an evaluation should be 
identified by those who call for it and by the evaluator 
/ho carry it out; the design of an evaluation should 
inticipate the primary audience (s), and the procedures, 
methods, analysis, and the language of its reports shoul 
:orrespond to the needs and expectations of the primary 
iudience(s). This does not mean that the findings of ar 
evaluation will be useless or wholly irrelevant to the 
'secondary" audiences, but it is likely that there will 
lave to be some amount of translation and 
reinterpretation to make the information useful to them. 
)efining the audience and targeting the message will 
reduce the frustration that often accompanies the more 
selectic attempts to speak simultaneously with many 
:ongues to many groups. Inevitably the selection of the 
>rimary audience(s) becomes a controversial process, one 
:hat must be endured, coped with, and responded to by tt 
►valuator. In the case of evaluations that are mandated 
>y Congress or commissioned by the Department, the 
landate should include some designation of the primary 
tudience(s) to which the evaluation is addressed, as a 
luide to the evaluators. 

The evaluation process is necessarily a controversial 
me that requires more than technical and procedural 
solutions. Technical matters and procedures are not 
inimportant, but there are other important demands that 
lust be managed with equal care. Those demands include 
resolving the tensions among opposing values and 
>erspectives, dealing with political priorities, and 
:aking account of contrasting methodological traditions. 
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THE VARIETIES OF 
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relative effectiveness of progra 
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Question 

Research Procedure 

Used 

A. How big is the problem 
and where is it located? 

Needs assessment 

Assembly of < 
(Census, N 
Special sampl 
Ethnographic 

B. Can we do anything about 
the problem? 

Basic research 

Assembly of < 
studies 
Specially corr 

C. Will a proposed program work 
under optimal conditions? 

Small-scale testing 

Randomized < 
experimen 
Pilot studies c 

D. Can a program be made to 
work in the field? 

Field evaluation 

Ethnographic 
Randomized i 
Field tests ani 

E. Will a proposed program 
be efficient? 

Policy analysis 

Simulation 
Prospective c< 
studies 
Prospective c< 


Questions Arising for Enacted and Implemented Programs 


Policy 

Question 

A. Are funds being used 
properly? 

B. Is the program reaching the 
beneficiaries? 


C. Is the program implemented 
as intended? 


D. Is the program effective? 


Evaluation/Social 

Research Met 

Research Procedure 

Used 

Fiscal accountability 

Fiscal records 
Auditing and 

Coverage 

Administrate 

accountability 

Beneficiary si 
Sample surve' 

Implementation 

Administrate 

accountability 

Special survey 
Ethnographic 

Impact assessment 

Randomized 
Statistical me 
Time series st 
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large-scale sample survey, such as the study by C 
et al. (1966) on equal educational opportunity, 
assessments do not have to be undertaken solely w 
quantitative techniques. Ethnographic research m 
be instructive, especially in getting detailed kn 
of the specific nature of the needs in question; 
likely to be especially effective in determining 
nature of a need and understanding the processes 
in the generation of a problem. Formal quantitat 
procedures, however, are essential when the exten 
need has to be established. Obtaining accurate, 
up-to-date data on the size and distribution of a 
problem, such as illiteracy, is an important firs 
in planning. Assessment of need and of the conte 
which the need is prevalent will help define the 
problem. Needs assessment will also help determi 
size of a program and attendant costs, at least i 


Basic Research—Choice of Intervention 

The second question concerns whether anything can 
about the problem, and if so, what intervention a 
the most promising. Answers to this question dep 
largely on how much is understood about the probl 
what policy-related factors can be changed to aff 
Basic research is the activity that provides the 
to this question. Hence, long-range support for 
research on educational processes is critical for 
development of the fundamental ideas for educatio 
programs. For example, it is necessary to know w 
is a connection between socioeconomic level and t 
of learning of basic skills by children in order 
properly design programs to improve the learning 
among children from the lower socioeconomic level 
is also necessary to know how much such learning 
could be improved by changing teaching methods, b 
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educational processes is often 
However, basic research often 
policy variables because basic 
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experiences of children. With 
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policy-relevant general resear 
contract research programs wit 
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arises is whether a specific 
Pilot testing of proposed pr< 
and demonstrations can often 
on whether and how such prog 
contract-learning experiment 
Economic Opportunity in the 
while some contractors could 
experiences, the program aro 
mmcnq teachers and school sv 
* successful - 1 


because they are powerful. But because they are 
expensive, the scale should be relatively modest, 
great virtue of randomized controlled experiments 
they eliminate the possibility that effects may b 
by processes other than the intervention; hence, 
give a potentially useful program the most valid 
Moreover, program administration can be controlle 
ensure that the intervention takes place as inten 
Under such conditions, a program has the maximum 
of working: if it is not effective when carried 
under controlled conditions by dedicated research 
there is no reason to believe that it will work u 
conditions. However, a commitment to randomized 
experiments for testing programs should not miniir 
complementary potential of ethnographic studies a 
stage, particularly to document why a particular 
intervention succeeds or fails. 


Field Evaluation—Program Delivery 

Even if small-scale testing demonstrates a progra 
effectiveness, it should often be changed before 
widely adopted. The relevant question is how pro 
adapt a proposed program so that it will be effec 
when it is no longer under the control of researc 
specially trained personnel. Unless the program 
made to work in school systems and in the hands c 
personnel (or other intended service deliverers), 
not alleviate the problem it is supposed to addre 
matter how effective it was in the experimental s 
(Rossi 1979a). A process of mutual adaptation of 
takes place (Berman and McLaughlin 1975-78) that 
the program as carried out in a given site as muc 
site is changed by the program. Changes that ar« 
to be made by the people and institutions that wi 
responsible for program delivery must be understc 
built into the program in such a way that effecti 


et al. 1980) tested the notion of linKin 
to school attendance and performance. 

Randomized controlled experiments are 
extremely powerful tool at this stage; o] 
should be used to compare several altern* 
delivery. They should be accompanied by 
activities that use sensitive and observ. 
in close contact with field testing sites 
accounts can be extremely useful in unde) 
programs do or do not work as anticipated 
specifics vary from site to site, and whc 
impede or facilitate implementation. 


Policy Analysis—Program Efficiency 

Finally there is the issue of whether a p 
efficient, a question that is answered th 
prospective policy analysis. Here the is 
the program will cost, how much service w 
at what level of cost, and whether the an 
of the proposed program overshadow the an 
benefits. Simulation and prospective ana 
data from small-scale tests and from fiel 
are inexpensive and ought to be performed 
program is enacted into law or widely ado 
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the use of federal dollars; usually only the larg 
programs are audited by federal auditors. Fiscal 
tend to overlap with other forms of evaluation wh 
questions are also asked about how the money was 
(not just whether it is accounted for). Since 
conventional accounting categories are generally 
sufficiently sensitive to determine the level of 
being delivered, the fact that funds appear to be 
appropriately spent in an accounting sense does n 
necessarily mean that program provisions are beir 
carried out as intended. Fiscal accounts cannot 
establish program integrity, nor can such account 
establish the true cost of programs, since it doe 
consider hidden or opportunity costs. 


Coverage Accountability 

A significant substantive issue is whether a proc 
reaching the population that is intended to recei 
benefits. It should be noted that this issue oft 
out to be of considerable importance: not infreq 
programs do not reach their intended beneficiaris 
they reach persons who were not intended to be 
covered—as was the case for Title VII bilingual 
education programs (Danoff 1978) and for the tele 
program "Sesame Street" (Cook et al. 1975)—or be 
Studies designed to measure coverage are similar 
principle to those discussed under "Needs Assessr 
above. An important source of data for this kinc 
evaluation is a program's administrative records, 
often help to identify overcoverage where this is 
problem. Undercoverage, however, may often invoJ 
special surveys. 


Implementation Accountability 
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effectiveness estimates of a full year. 3 - This 
quantitative difference has to be translated into a 
qualitative difference when the decision to fund one 
rather than the other program comes into question. 

The critical effectiveness issue is whether a program 
does anything for its beneficiaries to help them advance 
towards the goals of the program. While it is relatively 
easy to measure the status of beneficiaries at any time, 
the difficult problem is to determine what their status 
might have been had they not participated in the 
program. An ideal solution to this problem is the 
randomized controlled experiment, which ensures that the 
people within the experiment who participate in a program 
are "identical" to the people in control groups who do 
not participate in the program. Randomized controlled 
experiments, however, are usually not feasible for 
studying programs that have been in operation for some 
time, since it is ordinarily not possible to find 
appropriate individuals who have not been exposed to the 
program to assign to control and experimental groups. As 
suggested above, such experiments are most appropriate in 
the program development phase. For ongoing programs, 
other techniques must be employed, such as comparing 
participants before and after a program has been enacted 
or comparing beneficiaries to those who do not receive a 
program's benefits. Such research and statistical 
techniques require extreme care; a large literature that 
is devoted to them warns of the many pitfalls in their 
use. 


Policy makers should call for impact assessment only 
when circumstances warrant such studies (see below). 

They should be wary of requiring impact assessment from 
agencies that cannot marshall the necessary skilled 
personnel. They should be equally wary of requiring 
impact assessment, which is expensive to do adequately, 
without providing sufficient funds. In particular, only 
a few local and state education authorities have the 
capabilities or resources to competently carry out impact 
assessments; hence, such tasks should not be imposed on 
all state and local agencies without special attention to 
providing sufficient resources. 



goal* wnne tnese questions aisu ausc uunuy 
planning phase of program development (see above), at 
this point in the process the answers are no longer 
anticipated costs and benefits but actual costs and 
benefits based on good estimates of effectiveness and 
field experiences with the programs. 

The main problem in answering such questions centers 
around establishing a yardstick for such an assessment, 
for example, dollars spent for units of achievement 
gained, for number of students covered, or for classes or 
schools in the program. The simplest way of answering 
questions of efficiency is to calculate 
cost-effectiveness measures, for example, dollars spent 
per unit of output. In the case of the "Sesame Street" 
program, several cost-effectiveness measures were 
computed, such as dollars spent per child-hour of viewing 
and dollars spent per additional letter of the alphabet 
learned (Ball and Bogatz 1970, Bogatz and Ball 1971). 
(Note that the second measure implies knowing the 
effectiveness of the program, as established by an impact 
assessment.) The most complicated mode of answering the 
efficiency question is to conduct a full-fledged 
cost-benefit analysis in which all the costs and benefits 
are computed. Relatively few full-fledged cost-benefit 
analyses have been made of social programs because it is 
difficult to measure all the costs and all the benefits 
in the same terms. In principle, it is possible to 
convert into dollars all the costs and benefits of a 
program? in practice, however, it is rarely possible to 
do so without some disagreement on the valuation placed, 
say, on learning an additional letter of the alphabet. 


WHETHER TO EVALUATE 

Implicit in the preceding discussion is the assumption 
that a program, prospective or enacted, can be evaluated 
in some way or another; however, that is not always 
true. There are some programs, whose characteristics are 
described below, that cannot be fully evaluated or that 
cannot be evaluated at all. 



have been detailed in laws or in regulations can also be 
evaluated as to whether they are being carried out as 
intended. But only programs that specify clearly the 
intended beneficiaries and the intended effects can be 
evaluated fully. This is not to say that programs with 
vaguely stated aims are not worthwhile; it is to say that 
they cannot be evaluated as to their effectiveness. 

Thus, a program that has the announced intention of 
enriching the cultural lives of high school students 
cannot be evaluated with respect to its impact because 
the aim of "enriching the cultural life" is simply not 
specific enough to provide criteria for judging 
effectiveness. In addition, the group of intended 
beneficiaries, high school students, is so broad and 
inclusive that one simply could not measure "effects" for 
all of them. 

A prime requisite for being able to evaluate the 
impact of a program is the existence of clearly 
designated, specific aims. But, as Wholey et al. 

(1975:89) note: 

As a natural result of the political process, 
federal programs usually have many poorly defined 
objectives. Authorizing legislation and program 
guidelines are generally vague about program 
objectives and priorities. . . . Policy-makers and 
managers often perceive that ambiguity about what 
constitutes success is an asset, permitting 
flexibility and helping ensure survival. 

This situation often puts evaluators in the position of 
setting goals or selecting among several stated goals. A 
program may have a number of diverse goals: for example. 
Head Start was intended to provide better health care and 
nutrition for poor children, improve their cognitive 
development, increase their social competence, improve 
the conditions of participating families and communities, 
serve as a focus for political action and community 
organization, and result in more effective functioning of 
other service agencies. (See, for example. Office of 
Child Development 1973.) In such cases, evaluators and 
those who commission evaluations must agree on which of 
the goals are most important to assess and whether they 
are sufficiently specific to permit an impact 
evaluation. Often, however, the problem of goal 
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evaluability as a first step in evaluation rather than to 
assume that all programs can be evaluated. Indeed, we 
commend the Department for shifting some of its 
evaluation resources in this direction; so far, 10 
evaluability studies have been commissioned by the 
central evaluation unit of the Department. 


WHEN TO EVALUATE 

Even if a program is sufficiently specified to allow both 
accountability and impact evaluations, conducting impact 
evaluations may be inappropriate at a particular time 
because of the stage of program development or 
implementation. There are three phases in the life of a 
program that are notably inappropriate for impact 
evaluations. The first is during the program's 
development. We have suggested that a proposed program 
be tried out under actual field conditions after it has 
been proved to be effective in a controlled experimental 
setting. The purpose of this phase is to adapt the 
program so that it will be maximally effective under 
normal operating conditions. Obviously, impact (or 
summative) evaluation is totally inappropriate during 
this phase; at this point, evaluation should be used as a 
tool to fine-tune the program, not to judge it. 

The second phase is after a program has been enacted 
and is being put into operation. All programs require a 
shakedown period, during which program administrators 
develop regulations and operational procedures and 
teachers and school personnel (or other service 
deliverers) become familiar with the program's objectives 
and methods. The more complex a program, the greater the 
start-up problems. When a program allows flexibility and 
local choice, further time must be permitted for local 
decision making and development of specific features. 
Until a program has stabilized, it ought not to be 
evaluted, except for fiscal accountability. Too many 
negative findings have, in the past, been due to 
premature impact evaluation. Even accountability 
evaluations may be inappropriate in the early 
implementation stage, as demonstrated by findings on weak 
administration and even misuse of Title I funds in the 
first studies of the program, findings that did not hold 
up once personnel at the state and local levels had 
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Instead of specifying methods, Congress should make sure 
that evaluators are clear about the questions to be 
answered. 

Figure 1 above identifies 10 kinds of evaluation 
activities. At least part of the charge that evaluations 
have been irrelevant to Congress's needs for information 
stems from the fact that Congress has often been 
interpreted to be calling for impact evaluation when in 
fact it desired only to know, say, how well a program was 
meeting its coverage requirements. A call for evaluation 
that does not specify what questions are being asked can 
lead to the mismatching of expectation and performance by 
Congress and the evaluators. While legislators might 
include the policy questions to be addressed directly in 
the legislative provisions for evaluation of a program, 
it may not always be possible to frame questions with 
sufficient specificity at the time evaluation provisions 
are being enacted, especially for new programs. In such 
cases, sufficient dialogue should take place between the 
legislators and the implementing agency and the 
evaluators to ensure that the evaluation will meet its 
intended objective (Berryman and Glennan 1980). 

Congressional mandates for evaluation should also 
identify the audience that is to be served by the 
legislated evaluation: Congress beneficiaries such as 
parent or other interest groups, local program 
administrators, federal program administrators, and the 
like. The reasons for specifying audiences in any 
evaluation are discussed in greater detail in later 
chapters. The reason for including audience 
specification in this recommendation is that such 
specification will also sharpen the policy questions 
because different audiences tend to have different 
information needs. 

Though we recommend that it be specific with respect 
to question and audience, legislative language regarding 
evaluation should refrain from specifying details of 
method (such as sampling procedure or use of control 
groups) or of measurement. These are matters requiring 
careful technical consideration of specific evaluation 
conditions and contexts and should be chosen only after 
adequate planning and the application of expert knowledge. 



:y ] 
npl< 


ty and appropriateness. 

One of the welcome procedural improvements in rece : 
years has been the greater use of pilot tests of prop ie<i 
national programs. The argument is often made that p .ot 
te fV nd fieLd evaluations are costly and time consul ng 
an t at an urgent social need cannot remain unaddres! id 
while the ponderous process of research proceeds. Bui 
t e urge to get programs off the ground without prior 
testing brings with it certain and often high costs: 
programs develop an array of self-interested suppliers 
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pilot test of the proposed evaluation. Such a pilot test 
can be used to find out what measurements can and cannot 
be made of program benefits, how programs should account 
for and measure costs, which testing instruments and 
procedures are disruptive and which are not, how large a 
sample of beneficiaries is needed to get valid program 
measurements, and so forth. If a pilot test of the 
evaluation is carried out in conjunction with the pilot 
test of the program, the design of both the program and 
of the evaluation requirements will be strengthened. 
Indeed, if evaluation requirements are not pilot tested, 
it is difficult to see how those charged with evaluation 
responsibilities at the local and state levels are to be 
held accountable. 


STANDARDIZATION OF METHODS AND MEASURES 

As indicated in the preface to this report, one of the 
missions given to the Committee was to make 
recommendations and proposals ". . .to ensure that 
evaluations are based on uniform methods and 
measurements." The Committee's major contribution to 
this goal is to attempt to develop a terminology for the 
various kinds of evaluation activities, as discussed 
above, and to match evaluation questions with appropriate 
research approaches. However, we believe that to proceed 
any further with specific recommendations for attaining 
uniform procedures and measurement is a premature step at 
this stage in the development of evaluation. 

At the present time, the science and art of evaluation 
is in a state of considerable change and improvement. 

Each of the social science disciplines has made 
contributions to the procedures now used, and while there 
is some agreement on the rough preference ordering of 
procedures to address a set of policy questions, the 
rapid rate of development along with considerable 
diffusion of methods from one field to another means that 
today's preferences may be superseded by tomorrow's more 
mature understanding of the proper fit between problem 
and method. In addition, evaluation activities are being 
undertaken in a variety of substantive areas—not only in 
education, but in manpower training, energy conservation, 
health services delivery, child care, public welfare 
payment plans, criminal justice procedures, and so 



field of evaluation. 

The Committee believes that, while the goal of 
attaining uniformity in evaluation methods and measurt s 
is an extremely desirable one, it cannot be attained j ; 
the present time without prematurely inhibiting furthe 
advances in the field of evaluation and stopping it si »rt 
of needed development. The recommendation below that he 
National Institute of Education (NIE) continue and 
strengthen its program of support for basic research i 
evaluation methods is made in part to accelerate full 
development of the field of evaluation. 
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school system), and the evaluators. When the sponsor is 
a federal agency, there are three control points within 
the agency: the evaluation monitor, the contracts 
office, and the manager of the program being evaluated. 
The complexities created by these multiple organizational 
relationships create constraints for any study, and those 
constraints have been given little attention. Our own 
limited findings related to such issues are reported in 
the next three chapters? those findings make it clear 
that the evaluation process must be better understood if 
it is to yield good results. 

The National Institute of Education should encourage 
work in the noted areas of methodology and process as 
part of its evaluation research program. Furthermore,, 
with rare exceptions, when a specific methodological 
question must be addressed in a given time frame or the 
process of a specific evaluation is to be studied, all 
such research should be carried out through a competitive 
grants program that specify the areas of interest but not 
the approach to be taken. 


NOTES 


1 Success here is defined in terms of the objectives of 
the program. It is quite possible that a program 
successful with respect to its own objectives may be 
educationally undesirable. For example, perhaps more 
time was spent on a targeted skill and so some other 
important skill was neglected and hence less developed 
than it would have been in the absence of the 
program. To gauge the overall educational 
contribution of a program, it is necessary to assess 
such negative as well as the positive effects. 

2 A good deal of knowledge that can be applied to 
program improvement may, in fact, be gained through 
documenting program variations and their effects. A 
panel of the National Research Council's Committee on 
Child Development Research and Public Policy is 
currently reviewing outcome measurement in early 
childhood demonstration programs. Given that local 
program variation is encouraged by many early 






3 

Quality of Evaluation 


Knowledge about the quality of evaluation studies in 
education is limited. It comes from three sources: 
technical critiques and reanalyses of specific (usually 
large-scale) studies, a few scattered reviews of some 
samples of evaluations, and analyses of the influence of 
the political context on the quality of evaluations. The 
effects of the managerial context on quality—how 
evaluations are commissioned and carried out—has 
received considerably less attention. Yet the level of 
funding, what types of organizations usually perform 
evaluation studies, and the availability of adequately 
trained individuals all influence the quality of 
evaluations. In addition, procurement procedures can 
encourage or discourage creativity, and 
interorganizational complexities can introduce delays 
that often have deleterious effects on the course of a 
s tudy. 

There are several dimensions to the issue of quality. 
Evaluations can be competently done but not be very 
creative. They can be imaginatively done but be sloppy 
on some points. The various standards for evaluation 
work recently developed by a number of groups (Joint 
Committee on Standards for Educational Evaluation 1980, 
U.S. General Accounting Office 1978, 1979, 1980b, 
Evaluation Research Society 1980) may be useful to the 
profession, but since any major evaluation is a 
customized task, they cannot resolve quality issues in 
any specific instance. Furthermore, quality is 
inevitably subjective, especially in an activity such as 
evaluation for which facts and values are inextricably 
linked. For these reasons, the Committee's 
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not necessarily inhibited the use of evaluation 
findings. Datta (1979) analyzes an interesting example 
of a study on the effects of federal education programs 
(Berman and McLaughlin 1975-78) whose summary findings 
were widely accepted and applied in policy formulation 
without questioning when later examination revealed 
considerable problems with some of the summary 
conclusions and the interpretations they had been given. 


Reviews of the Field 

Aside from the critiques of some landmark studies, there 
have been few systematic reviews of the quality of 
evaluations, such as assessments of representative 
samples of studies published during a specified time 
period or resulting from the activities of a particular 
sponsor or group of performers. In an early study, 
Bernstein and Freeman (1975) started with 236 studies 
from fiscal 1970, of which they ruled out 84 as not being 
comprehensive, i.e., not measuring both process and 
impact. Using criteria oriented toward quantitative and 
experimental methodology, they found only 27 of the 
remaining 152 studies to be of high quality, less than 20 
percent; 76, or 50 percent, were deemed to be of low 
quality. Minnesota Research Systems, Inc. (1976) 
examined 110 research studies (about 45 percent of which 
were classified as evaluations) funded by the U.S. 
Department of Health, Education, and welfare (HEW) and 
completed in 1973 and 1974. Less than 10 percent were 
deemed to be free of significant methodological flaws. 
Moreover, they found that in 90 percent of the cases the 
flaws already existed at the proposal stage. 1 

The size and the scale of evaluation studies have 
grown considerably since the early 1970s, but problems of 
quality appear to persist. Rossi (1979b) reports on an 
examination, done over 3 years for the Summer Evaluation 
Research Institute at the University of Massachusetts, in 
which several hundred requests for proposals (RFPs) were 
screened to look for those likely to lead to a sound 
research plan. Using that criterion, less than a dozen 
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(i.e., state officials rated state reports more highly, 
local officials rated local reports more highly), even 
the most favorable ratings considered only two-thirds of 
the reports adequate or better, and in the least 
favorable cases (state views of local Title I reports), 
barely one-third were considered to be adequate or 
better. Among other recommendations, the GAO requested 
that the Office of Education review the program 
information collected in local agency evaluation reports 
in order to determine whether such information could be 
aggregated to serve the different needs of federal, 
state, and local governments. 

In the second study, focused on evaluation carried out 
at the local level, Lyon et al. (1978) reviewed 116 
studies for the presence or absence of criteria 
considered to be necessary elements of an evaluation. As 
Boruch, Cordray, and Pion note (Ch. 5:7 in Boruch and 
Cordray 1980), the Lyon study "suggests that simple 
standards are not often adhered to." Holley (Appendix C) 
comments that among the possible reasons are insufficient 
evaluation funds, insufficient control of the funds and 
often of the evaluation activities themselves by program 
administrators, and lack of training and experience of 
many of the personnel who are assigned evaluation 
responsibilities. 


The Political Context 

One of the sources of disappointment with evaluation is 
that it appears not to have contributed as effectively as 
hoped to the making of decisions about programs. At 
times, this lack has been attributed to the inadequate 
quality of many evaluations. More recently, however, the 
analytic literature dealing with the contributions and 
failures of evaluation has reflected a considerable shift 
regarding the potential for decision making offered by 
program evaluation. Such early studies as the 
Westinghouse-Ohio evaluation of Head Start (Cicirelli and 
Granger 1969) were in part condemned for a narrow choice 
of outcome measures that did not adequately reflect 


dilemma of program legislation that may be specifi on 
process but is vague on intended objectives, yet it idates 
evaluation. Rossi et al. (1979) have suggested tt t 
program goals should be spelled out specifically e Dugh 
to allow impact assessments; more recently, he anc 2hen 
(1980) have argued that researchers cannot simply :cept 
official goals but must learn how to interpret prc rams 
and their likely effects more accurately in order 3 
design evaluations that are sensitive to program i pact. 
Wholey, when he was Deputy Assistant Secretary for 
Evaluation of HEW, introduced the notion of evalua Llity 
(see Appendix A) whereby short-term, exploratory 
evaluations would determine the operational object ;es oi 
a program and whether they could be measured (Whol r 
1979); if they could not, costly impact assessment rould 
not be commissioned. Cronbach et al. (1980) argue :hat 
the quest for specification of goals is futile and ;hat 
evaluation is a prospective activity better suited -.o 
understanding processes and events for future prog im 
formulation than for retrospectively appraising th' 
performance of programs against predetermined obje ;ives. 

There is more agreement on the role of the evali itor 
in the decision-making process, namely, that the 
information developed through the processes and by .he 
canons of social science is, and should be, only oi \ of 
the determinants of policy regarding education (or ny 
other social) program decisions. Arguments derivii \ fros 
research on how evaluation findings are used (Capl< i 
1977, Alkin et al. 1979) have led to recommendatioi i that 
evaluations, to be useful, must be done in close 
cooperation with the intended user and must also ii olve 
a process of negotiation that draws on the views oi 
beneficiary and constituency groups. However, sucl a 
process is often counter to the objectivity conside ed to 
be a hallmark of quality evaluation. According to 
Schreier (1979), it pits the insider's (e.g., clier 's, 
teacher’s, program manager's) intuitive perception 
against the outsider's concern with quantitative 
assessment. The result is that they are unlikely t 
agree on goals. The focus of evaluation may then s ift 



Over the last decade, evaluation of education programs 
has become big business, and this has had an impact on 
quality. When the first legislative mandate for 
evaluation was written into law as part of the 1965 Title 
I (ESEA) legislation, evaluation was considered to be an 
activity carried out at the local level for 
accountability and to improve the program. Every year 
thereafter, local evaluation activities were initiated 
for a number of programs, usually coordinated by an 
evaluation specialist within the federal program office. 
As the number of activities grew, concern with quality 
and need for generally applicable procedures led to the 
establishment in fiscal 1970 of a central evaluation unit 
in OE (see Appendix A). 


Funding 

Before fiscal 1970, the Office of Education had about 
$1.25 million per year for central evaluation available. 
In that year, for the first time, there was a separate 
line item for evaluation. The peak funding for the 
central evaluation unit was reached in 1978, with $29.7 
million obligated for evaluation contracts. In 1980, the 
amount had decreased to $19.4 million. The most 
precipitous drop within the unit came in evaluation funds 
for discretionary purposes, i.e., not earmarked for a 
specific title: these funds dropped from $7.1 million in 
1977 and 1978 to $3 million in 1980 (U.S. Department of 
Health, Education, and welfare 1979b). 

According to Reisner's estimate (Appendix A), in 
fiscal 1980 the Department of Education was planning to 
spend some $40 million on a variety of evaluation 
activities, half of the work being carried out by the 
central evaluation unit and nearly a quarter by the 
Inspector General. If one wishes to calculate the total 
amount spent for program evaluation in education, that 
estimate needs to be augmented by the amount spent by the 
General Accounting Office (estimated at $2.5 million) and 
an unknown amount of federal funds devoted to evaluation 
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Personnel 

Evaluation is a relatively new field that is to a 
significant degree staffed with individuals recruited 
from other fields. This newness creates a critical 
quality problem at the state and local levels (see 
below), but important gaps exist throughout the 
evaluation enterprise. Of specific concern are the 
underrepresentation of minority group members in 
educational evaluation, the communication barriers 
between evaluators and administrators, and the failure of 
individuals charged with evaluation responsibilities to 
keep up with developments in the field. 


Toward Equal Educational Opportunity 

In order to further the national commitment to equal 
educational opportunity, nearly 80 percent of federal 
education programs are targeted for racial, ethnic, 
handicapped, and other minority or disadvantaged groups. 
And if federal programs are to provide more effective 
educational services for these groups, consistent input 
on their needs must be part of the evaluation process. 

An examination of social science research over the last 
40 years (Gregg et al. 1979) shows how research questions 
have changed in those fields—and those fields only—in 
which the subjects of inquiry have participated actively 
in defining the problems. Though talent and skill remain 
the prime requisites for evaluation personnel, the 
perspective that comes from being a member of the 
recipient group augments the evaluation process in 
important ways. Thus, one can look at bilingual 
education from the viewpoint of society as a whole, of 
the classroom teacher, or of the non-English-speaking 
child and family. Women, blacks, and other minorities 
have helped give a different cast to educational research 
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individuals in performing and reviewing evaluations. Our 
first recommendation addresses the issue of the talent 
pool, since unless it is expanded minority participation 
in evaluation will continue to remain limited. At the 
same time, the recommendation considers some additional 
gaps in the training of evaluation personnel that must be 
remedied if the quality of evaluations is to improve. 


Training 

Recommendation D-4. The Department of Education should 
provide funds for training programs in evaluation to 
increase the skills of individuals currently charged with 
carrying out or using evaluations and to increase the 
participation of minorities . 

This recommendation covers three training needs that 
require extramural support: recruitment and training of 
minority individuals; training to improve the 
communication between evaluator and the user of 
evaluations; and training for those currently involved in 
evaluations. Two related issues are covered in other 
recommendations: broader technical assistance to state 
and local agencies is discussed later in this chapter, 
and intramural training for federal evaluation and 
program staffs is discussed in Chapter 5. 

After 15 years, the rationale that there are no 
minority researchers available to help evaluate education 
programs is not tenable. Their absence is particularly 
marked, and particularly detrimental, at the senior 
levels of both sponsoring and performing organizations. 
There are increasing numbers of minority persons in 
training in Ph.D. programs in social and behavioral 
sciences, in part because of numerous federally sponsored 
fellowship programs. 6 These social and behavioral 
science graduate students very often express interest in 
"applied research," but do not often have an opportunity 
to learn about it. They represent a sizable pool of 
potential evaluation researchers who could staff 
positions in the Department of Education, who could 
advise and consult with local and state evaluation 
groups, and who could work with universities and private 
consulting (including 8-A) firms in carrying out 
evaluations. Fellowship and internship programs in 
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preparation than that of classroom teaching. These 
practicing evaluators need opportunities to upgrade and 
improve their skills. (See Appendix C for details on 
training needs among local personnel and on some possible 
programs.) Insofar as new evaluators continue to be 
recruited, graduate-level training programs for 
evaluators will continue to need support. In part, such 
training would occur automatically through greater 
participation of the academic sector in evaluation work 
sponsored by the Department. 

The suggestions in this recommendation require the 
funding of extramural training and fellowship programs. 
One channel for such programs might be the Assistant 
Secretary for Educational Research and improvement, 
either through the Office of Dissemination and 
Professional improvement or through the National 
Institute of Education, which already runs a program to 
increase the participation of women and minorities in 
educational research and development (R&D). 

Congressional authorization for such programs already 
exists, at least for NIE, in the 1980 Higher Education 
Amendments (P.L. 96-574), and in the Special Projects 
Act, though the latter requires that Congress be notified 
before a program is initiated. 


Interorganizational Complexities 

There is an important difference between most social 
science research and evaluation. In most research, 
control of a study is mainly in the hands of the 
researchers: they decide what to study and how the 
research is conducted. Even when action sites like 
schools are involved, the researchers select them on the 
basis of the intended research design, and if some sites 
are unwilling to cooperate, others can be substituted. 
The funding agency's role is usually limited to 
negotiating grant amounts and requiring nominal progress 
reports. 

In evaluation, the researchers share control to a 
considerable extent with two other parties—the 




within the program, how far along it is in the 
implementation process, how much freedom is given o 
individual sites to carry out their own miniprogra s. 
Third, the research team must work with a specific set o! 
action sites. In order to establish workable 
relationships with action sites that may be reluct it 
participants, the researcher must provide a set of juid 
pro quos, such as collecting data not necessarily 
relevant to the evaluation study but wanted by peo Le at 
the site, providing technical assistance, or carry ig out 
special analyses. Moreover, neither the action si 5 nor 
the sponsor is a monolithic entity, and different 
requirements and constraints may be imposed by dif irent 
organizational units within each. Of particular 
importance is the increasing fragmentation of 
responsibilities within federal executive agencies the 
usual type of sponsor) , in which at least three pai -ies 
may have some influence over the design and conduct of 
research: the project monitor for the evaluation i udy 
itself (and the cognizant evaluation unit) , the prc ram 
manager and responsible office for the program beir 
evaluated, and the contracts office. The resultinc 


context for evaluation is depicted schematically ir 
Figure 2 (see Yin 1980). 

The quality of evaluations is subject to the mar ed 
constraint imposed by the need for researchers to w ck 
within these interorganizational complexities: eac 
decision has to be negotiated and agreed to by a nu >er 
of parties, if nothing else, the process of arrivi 3 at 
compromises acceptable to all parties is time—consul Lng/ 
often to a degree that makes the original study des jn no 
longer feasible; this is especially true during the 
procurement phase and the implementation phase. 

The low participation of the academic sector in 
evaluation work should not be surprising, even thou< i 
academic organizations represent the largest single roup 
° f a11 educati onal research (Appendix 
B:Table B-4) , because of the process by which evalue ions 
are procure by the federal government. That proces has 
become more and more complex over the decade of gro* h in 
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FIGURE 2 The interorganizational complexities of 
evaluation research. 


evaluation funding. Requests for proposals (RFPs) have 
become longer and more detailed: in addition to spelling 
out basic design, methodology, what to measure and how to 
measure it, they may specify the sites to be studied, the 
data elements to be analyzed, and the time intervals for 
different collection steps. Responders have little 
freedom to formulate research approaches they consider 
more appropriate, let alone to reframe evaluation 
questions. Moreover, the average response time allowed 
hardly permits such luxury: for eight of the ten RFPs 
isssued for new studies in fiscal 1980 by the Office of 
Program Evaluation (the central evaluation unit for the 
Department of Education), proposals were due only 1 month 
after issuance of the RFP; for the other two RFPs, 
proposals were due in 6 weeks (see Table 1). The 
proposed length of time for these studies ranges from 
18 months to 2-1/2 years and their projected cost ranges 
from $150,000 to $2 million. The largest of these 
studies, which comprises a whole series of substudies of 
the implementation of Title I at the state and local 
levels, is estimated to take 2-1/2 years and cost $2 
million. The RFP for this study was issued on July 23; 
proposals were due 29 days later, on August 22. 7 





^Originally planned for 9/30/80, postponed until fiscal 1981. 



Office of Program Evaluation in the spring of 1979 were 
not approved until January of 1980; some studies were not 
approved until May. Therefore, except for two RFPs that 
had been held over from fiscal 1979, no work statement 
could be completed until March, and a number were delayed 
until June or July by further review within the Grant and 
Procurement Management Division, the Department's 
contracts office. Thus, seven of ten planned awards for 
new studies were not scheduled until September, at the 
very close of the fiscal year. 

Institutions whose business is based on federal 
contracts resulting from RFPs and who have considerable 
staff resources assembled at any point have an obvious 
advantage when responses must be made in such a time 
frame. The recent change in the federal government's 
fiscal year has positioned many complex procurement 
actions in the summer quarter, a period during which 
academic institutions are even less likely to be able to 
respond quickly. Contract records substantiate Sharp's 
findings (Appendix B) that universities and small-scale 
performers are largely shut out of the types of studies 
($100,000 and over) that have been in favor. Of 84 
contracts for evaluation and planning awarded by the 
central unit in 1979, only 1 went to a university, in the 
amount of $350,000 of a total of $21,526,089 in awards. 

On the other hand, one for-profit firm received four 
contracts for a total of more than $5 million. Nineteen 
contracts to three private firms and one large regional 
laboratory (also a private corporation) 8 accounted for 
50 percent of all funds awarded. Through their success 
in responding to evaluation RFPs, the private performer 
organizations have been able to accumulate "large, 
sophisticated, multidisciplinary staff which are very 
knowledgeable about the major educational issues of the 
day" (Sharp, Appendix B:241). Whether current 
procurement procedures with their tight deadlines and 
enormous response burdens serve to deploy effectively the 
talent pool in even this limited domain is open to 
question. The reviews of evaluation proposals cited 
earlier in this chapter are not reassuring about the 
quality of responses elicited by the procurement process. 


instruments, and the analysis plan from a variety f 
perspectives: burden on respondents, technical qi lity, 
need to know (defined as being required by law) , £ d 
economic impact. Not infrequently, research desic s an4 
instruments that are the product of experts and th t havt 


Mianuments taai are tne product or experts ana m t ( 
been pilot tested are changed by reviewers who do ot 
have equivalent expertise or field experience. If a 
study is to be done at all, many compromises have o be 
made along the way by the contractor and federal m litor, 
In 1978, a new requirement was added to the cle canct 
process, namely, that all test and data collection 
instruments to be used in a study must be describe in 
the Federal Register (and available on demand) by 
February 15 previous to the school year in which t j 
information is to be collected.^ This requirement 
when added to all the other clearance machinery, s< 
compresses the time available for development of 
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The three recommendations below are aimed at introducing 
greater creativity and competence into the evaluation 
process during three stages: procurement, while a study 
proceeds, and after completion. 


Recommendation D-5. The Department of Education should 
structure the procurement and funding procedures for 
evaluations so as to permit more creative evaluation work 

by opening up the process and allowing a period for 
exploratory research . 

The increasing constraints imposed as a result of the 
greater visibility of evaluations and the attempts to 
control their management and process have limited 
contributions from the field of evaluation. These 
constraints have reduced the opportunity for infusing 
novel approaches into either programs or evaluations. 

They have also reduced the potential of evaluation to 
contribute to the policy process. 

The more complex the evaluation, the less likely is it 
that anyone can spell out ahead of time the best methods 
for addressing the questions that the evaluation is 
designed to answer. The current RFP process in 
particular ignores this fact. The Committee believes 
that this process can be made more flexible. An RFP 
often presumes some things about the program are known 
when they are not. This can range from something 
fundamental—e.g., existence of the program at a site—to 
something trivial—e.g., existence of records. RFPs also 
often downplay the possible effects of interorganiza- 
tional relationships on the evaluation process. In 
addition, problems and issues in executing the evaluation 
are not anticipated, and many cannot be anticipated. The 
unknowns or unknowables suggest that an RFP that attempts 








benefit of 1 year of planning for his national 
longitudinal study of the high school class of 1980 
(Coleman et al. 1979). That planning included intern Vfc 
research on what kinds of policy issues could be 
addressed in the future using such data. As another 
example, the NIE compensatory education study (Natior 1 
Institute of Education 1976) had 6 months to clarify 
questions before the study was initiated. 

Mechanisms for providing opportunity for expertise in 
evaluation to improve the quality of evaluations incl 


• inviting bidders to specify alternative methods 
of evaluating the program at hand and how such method 
would be tested, in addition to asking that they meet 
formal RFP requirements; 

• inviting bidders to design small side studies 
that can lead to durable general statements about 
particular approaches and providing support for those 
side studies found to be meritorious? 

• assuring that sufficient time is available for 
developing proposals for an evaluation project, at le< t 
6 months for complex evaluations? 

, issuing RFPs for pre-evaluation assessments that 
tn th ® problem better, lay out alternative approac es 

ua on and how they might be assayed, and so fc th 
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universities will not and should not participate in 
carrying out evaluations. The academic world is no more 
monolithic than any other community; within many 
universitiesi there are institutes or centers created 
precisely to respond to the interdisciplinary challenges 
of applied social science research. In addition, as 
funding for basic research has leveled off or even 
decreased, academic researchers have become more 
interested in applied work. The dismal statistics on 
lack of participation by universities in evaluations 
funded by the Department cannot be attributed solely to 
the unwillingness of universities to participate. 

By depending almost entirely on the competitive RFP 
procurement system, the Department is not able to take 
advantage of the creativity, objectivity, long-term 
commitment, and the cumulative knowledge and experience 
of the academic community. Nor can it attract 
participation by minority researchers, whose perspectives 
would enrich the questions and methods of evaluations, 
who are not able to assemble the resources needed for 
large studies in the time provided. Local and state 
agencies also cannot often contribute at the national 
level, even when they have the capability for high- 
quality work, because of the site requirements in many 
RPPs. Among the mechanisms for funding evaluations that 
can be used to open up the process and improve quality 
are unsolicited proposals, sole-source awards, 8-A 
contracting, cooperative agreements, 10 basic ordering 
agreements, 11 and grants. 

The Department should consider unsolicited proposals 
in order to encourage creative and innovative ideas that 
may be lost through the RFP system. Academic experts who 
have made significant contributions to the evaluation 
process should be encouraged to submit proposals that 
attempt to break new paths in theory or measurement of 
the effectiveness of education and other social 
programs. It is possible to carry out a competitive 
program of grant awards for unsolicited proposals in 
specified areas, as practiced by the National Institute 
of Education. 



evaluation whose background, experience, and experti 
cannot be matched. The use of this mechanism will i 
to open up the system to new ideas and contribute sc 
needed flexibility to the Department's evaluation 
activities. The Committee is fully aware of recent 
criticisms of consulting and sole-source procurement 
(U.S. General Accounting Office 1980a, Gup and Neuma 
1980, but see Wilson 1980). We believe, however, th 
the limited and judicious use of this mechanism can 
produce gains that far outweigh the risk of occasior 
abuse. When abuse does arise, it should be dealt wi 
a case-by-case basis, not by abandoning a useful 
procurement mechanism. 

The restrictiveness of the RFP process also 
contributes to the very low use of minority firms b} 
Department in securing evaluation contracts. Such i 
are usually small and have limited staff and so the} 
cannot respond as quickly to RFPs as the larger 
for-profit organizations that now dominate the evali 
field. The 8-A contracting process seems to be sel< 
used as a way of involving more minority firms, prol 
because evaluation studies have tended to be large i 
and 8-A firms are small. The issue of equal educat: 
opportunity that calls for the greater use and 
involvement of minority researchers will only be rei 
when more flexibility is built into the design of s 
and the contracting process. 

Cooperative agreements ought to be the mechanism 
choice when the principal purpose of the award is t 
benefit local or state operation of education progr 
authorized by federal statute. Such agreements may 
be used when substantial involvement is anticipated 
the federal agency as well as by the recipient of t 
funds. Studies carried out by a state or local age 
document program processes, improve program 
implementation, or test program alternatives are in 
to benefit the locality, but they can also help imp 
the program nationally. The former Department of H 
Education, and Welfare had an internal decree again 



Law Enforcement Assistance Administration of the 
Department of Justice. The Department of Education 
should exploit the potential of this procurement 
mechanism. Cooperative agreements are an obvious vehicle 
for encouraging local and state agencies that have the 
capacity to undertake evaluation work aimed at program 
improvement. 

Basic ordering agreements are a particularly useful 
mechanism for planning or evaluability studies and other 
limited work with a short time horizon. The Department 
could obtain greater flexibility and faster turn-around 
time by maintaining lists of qualified performers 
generated through periodic requests for qualifications 
(RFQs). These performers could then be called upon for 
limited studies. 

Grants are a particularly appropriate mechanism when 
creativity from the performer is important. The 
Committee urges that the Department institute at least 
two grant programs, one for local and state agencies (see 
Recommendation C-3 below) and a small grants program 
($50,000-100,000 per grant) to allow university 
researchers and others to pursue evaluation questions in 
designated areas of interest to the Department. The 
small-grants program should be run in conjunction with 
the research program at NIE suggested in 
Recommendation D-3 (in Chapter 2). Research grants are 
often considered to be appropriate only when the primary 
audience is to be other researchers and hence are 
considered inappropriate for policy-related reseach. But 
grant programs do not have to be untargeted, as is 
demonstrated by the well-defined grant programs developed 
by the various study sections of the National Institutes 
of Health and of Mental Health. Not infrequently, the 
research is both applied and immediately applicable, as 
in the case of the restorative materials program funded 
by the National Institute of Dental Research. 

The state and local program we are recommending could 
be in the form of grant awards or cooperative 
agreements. The purpose would be to allow selected 
agencies to study their own federally supported programs 
by documenting what actually goes on in the program at 
the classroom or school level, assessing the effects of 
the program or some of its components, and testing 
alternative program interventions. There should be 
national or regional competitions for each large federal 
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contribution and involvement from those most affected by 
the programs (beneficiary groups, teachers, etc.), and 
making use of the findings more likely through public 
exposure and understanding. 

For major national evaluations of important programs, 
the evaluation plan should be publicized by the agency 
before the project begins, when the RFP process is used, 
the agency itself should solicit as much outside advice 
as possible, thorough development of concept papers, 
planning conferences, and other pre-RFP activities. 
Proposal review should include experts from outside the 
sponsoring agency. After award of a contract, the 
contractor also should solicit the views of outsiders. 
Some questionable assumptions or pedestrian analytical 
approaches might be amended at this point. Then, when 
the project is done, outsiders should again review the 
work, its philosophical perspective, its technical 
ambiguities, and its policy implications. Such outside 
review would be facilitated if researchers were careful 
to spell out, in final reports, the limitations of their 
research: ”... what went wrong, what couldn't be done, 

what that means for the conclusiveness of the findings 
and . . . for their generalizability to particular 
populations" (Chelimsky 1978) . Later on, the data from 
the evaluation should be made available to others for 
reanalysis. If evaluations are controversial, either 
because of their execution or because of their 
recommendations, this process will allow such 
controversies to be aired. All of the results of this 
interchange, the evaluators' reports and the comments of 
outsiders, should then be made broadly available. 

There may be several ways to ensure adequate input and 
broad availability. One approach worth exploring is for 
the Department to sponsor an annual conference on 
important evaluations that are at various stages in the 
process—design, first completion, reanalysis. If this 
were done, the educational community would know where to 
look for the latest evaluation results, criticisms, and 
reanalyses, as well as for information about impending 
work. 

In line with previous remarks about the subjective 
nature of evaluation quality, opening up the evaluation 
process should provide mechanisms similar to those 
employed by such journals as Consumer Reports with regard 
to the market for consumption goods. The Department 
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confidentiality of information. Audit agencies such as 
GAO, or independent researchers, may have a legitimate 
interest in verifying quality of data generated in an 
evaluation. The process need not and should not breach 
promises of confidentiality made to individual 
respondents or invade their privacy. A report 
commissioned by the GAO on assessing evaluation quality 
(Social Science Research Council 1978) recognizes the 
additional needs of avoiding needless disruption of 
research and harassment of respondents. The report 
recommends several alternatives to the usual way of 
reinterviewing respondents including: independent 
sampling of the target population to compare statistical 
results obtained by the auditor with statistical results 
obtained by the evaluator; use of evaluators independent 
of both original evaluation staff and audit staff for 
reinterviews; drawing a subsample of the original sample 
for reinterview to minimize disruption of the research; 
and other strategies. In many intances, regathering of 
primary data is unnecessary: review of design, 
execution, and analysis is sufficient for judging the 
quality of major program evaluations (see also Hedrick et 
al. 1979). The critical point is that original 
evaluation information not be withheld by researchers, 
sponsors, or any other parties; the more such information 
is available, the less intrusive can be the approach 
taken in reanalysis and critical appraisal. 


STATE AND LOCAL ACTIVITIES 

Funding and Independence 

The amount of federal money spent for evaluation 
activities at state and local levels is not 
inconsiderable. Webster and Stufflebeam (1978; see 
Appendix C:Figure C-3) found that 35 large urban school 
districts spent a total of nearly $34 million on research 
and evaluation, of which $21 million (or more than 
two-thirds) was federal funds. But funding for 
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evaluation activities will be suspect. It evaluation is 
to be an independent function that can provide an outside 
view of program operations and effects, it must be 
separately funded. 

As a specific way of accomplishing the separation. 
Congress may wish to consider a required percentage 
set-aside for each program that would be devoted to 
evaluation activities at the state and at the local 
levels, with due consideration of thresholds below which 
no activity can be carried out adequately. Such a 
set-aside provision should be accompanied by reporting 
requirements that account for the money spent and that 
summarize evaluation results and their application. Over 
time, it will then be possible to judge whether the 
investment in evaluation is yielding the desired results 
in terms of program monitoring and improvement. 


Capability 

The competence and resources of the personnel charged 
with evaluation responsibilities constrains their ability 
to produce evaluations of acceptable quality. Only some 
school districts, particularly the large urban or 
suburban systems, have well-trained and sophisticated 
evaluators. For many smaller agencies with limited 
resources, staffing is inadequate for any of the complex 
evaluation tasks such as process or impact assessments. 

As Holley (Appendix C:258) notes: 

In most states certification standards are 
applied to personnel in federal programs. For 
example, a counselor, administrator, or supervisor 
must be certified to fill those roles in most 
states. In general, evaluators are not certified 
and no such standards are applied to the personnel 
filling the role of evaluator, in some LEAs and 
SEAs, the federal program director or coordinator 
may bear full responsibility for evaluation and 
even in agencies with substantial evaluation units, 
small federal evaluations may be completed by 
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To accomplish this requirement# it may be necessary to 
spell out in legislation dealing with evaluation 
activities the resources, coverage or target groups, and 
program services to be reported on by each recipient unit 
(local education agency, state education agency, 
community based organization, or other public or private 
agencies). Congress should also require that the 
Department institute quality control procedures that will 
ensure usable and comparable data on program funding and 
coverage. 

Evaluation tasks that go beyond accountability 
questions—for example, the assessment of educational 
impact or the identification and testing of alternatives 
that might lead to improved programs—should be a 
selective activity rather than imposed on all, regardless 
of competence and funds available. This recommendation 
is not meant to suggest that creativity in providing 
effective education cannot be found in school systems 
with limited resources. Inventive teachers and 
administrators iave always found ways of applying the 
lessons learned through experience to their classes and 
their programs, but they do not do it through formalized 
evaluation (David 1978). The task of understanding 
promising approaches and applying such understanding to 
program improvement at various sites is an extremely 
complex one that needs considerable investment of fiscal 
resources and the skill of highly trained people who are 
unlikely to be available to every school system and state 
agency in the country. Nor is it necessary that every 
site carry out that type of evaluation. If more were 
known about how to provide effective services through 
studies carried out at a limited number of sites and if 
school systems were then encouraged to try those 
alternatives that appeared most promising, program 
improvement could be expected. 

The description by Holley (Appendix C) of three 
alternative means of funding local evaluations documents 
the utility of providing discretionary funds on a 
competitive basis for program improvement. Congress may 
wish to consider authorizing a grants program for school 
systems that would allow funding of the most promising 
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policies for extramural work. 

In particular, state and local agencies need to be 
aware of the desirability of separating the evaluation 
unit from program administration. Especially in the case 
of impact assessment, there is an obvious conflict of 
both intellectual and monetary interest. Evaluators 
should in general be outside evaluators, and evaluations 
should not be controlled by the program administrators. 
The case is more ambiguous for formative 
evaluations—those that are aimed at improving programs. 
Responsible program administrators should be doing this 
kind of self-evaluation as a matter of course, but there 
are also powerful advantages of having outsiders do this 
kind of evaluation: outsiders bring a fresh and unbiased 
view and are likely to see new ways of solving problems 
in program administration and new approaches for 
improving program benefits. They are also not 
constrained to cover up inadequate performance, as 
internal evaluators may be inclined to do. The best 
approach may be to encourage continuing in-house 
evaluation efforts, but also to encourage agencies to 
make greater use of qualified outside evaluators. 
Technical assistance should help agencies organize their 
evaluation activities in such ways that they can derive 
the maximum benefit from their (and the federal) 
investment in this area. 


Recommendation C-4. Congress should require an annual 
report from the Department of Education on all evaluation 
expenditures and activities, including those at the state 
and local levels. 


The current evaluation report delivered to the 
Congress annually should be expanded to cover all the 
evaluation activities within the Department as well as 
those carried out by state and local agencies with 
federal education funds. Past annual reports have 
concentrated on the activities of the central evaluation 
unit; they have not been comprehensive with respect to 
evaluation activities carried out elsewhere in the 
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plan co cne Mnan Business Administration (sbaj , wmcn 
must have approved it for SBA assistance. An 8-A firm 
can be selected to deliver goods or services to the 
federal government without having to compete with 
other firms. 

5 A resource list compiled by NIE of minority firms 
competent to do R&D work in education during that 
period contained 185 entries; about two-thirds were 
8-A certified. 

6 Some of these programs are the Graduate and 
Professional Opportunities Program (GPOP) in the 
Department of Education, the Minority Fellowship 
Programs in NIMH, the Minority Postdoctoral Fellowship 
Program and the Women and Minorities Program in NIE, 
the Minority Access to Research Careers (MARC) in both 
NIH and ADAMHA, the Minority Fellowship Program in 
NSF, and the Health Center Opportunities Grants (HCOG) 
in HRA. 

7 This information, including the dates given in Table 
1, was provided by Priscilla (Pat) E. Dever, 
Administrative Officer, Office of Program Evaluation, 
U.S. Department of Education. We are grateful for her 
help and patience in responding to our inquiries. 

8 The 16 regional educational laboratories and R&D 
centers have a special relationship with the federal 
government through which they receive core funding 
outside the competitive process, some of it for 
evaluation studies, though they may—and several 

do—also bid on RFPs. Of ten $5-million-plus 
performers of educational R&D, two are regional 
laboratories; nearly all these institutions fall into 
the $1 million and over or "major performer" 
category. Because they have long-term relationships 
with the Department, they are in a favorable position 
to receive contracts for evaluation work. 

9 This provision was enacted at the behest of state 
education agencies so that they could plan adequately 
for their own data collection systems. It is 
questionable, however, whether evaluation studies that 
gather one-time information (even if collected more 
than once, as in pre- and post-testing or in 






cooriT J,y ° r ^nageriaTT;, uecause th <* project is 
Spiels" With °^ ec fed y e “ m f v leX ° r requires clos 
eomro 6 policy sturn! lly sponsored work. 

e^f e t Subc °ntraoti na , S ' P rojects requiring 
defin^ 10nS of federal' ^ r9e cur rlculum projects, a, 

11 *Vt- p £T M - Por 3 

a iS 3 written instruct cl 
future c f ° rth ne 9°tiated 9 ° VeCnraent and a contract, 
ot servic. tra0tS ' inc ludinq a** 333 k ° be applicable i 
de terminf 8 to be fu mishJi description of supplies 
generally 119 f66S to be Paid an Th ° f method for 
contra *. Y USed in coniun^t- • * TblS instrum ent is 
specifiers f ° Und to be qualifi^ 3 selected group oi 

12 A recent SUPPlies or seTvfn 6 t0 furnis h the 
1979^ eValu atio n oft ? 8 when needed. 

among state*™ diverse views^f ll™* Associates 
Panel's re>~ a9enc y Personnel effect iveness 

recommendations was^hat** ° f the reviewing 

Pon'aS ?" 1 " 9 ^Per^T" t0 
-Vat; r“-^ir T Ve 

evaluati oTt ® nlar 9ed object^ 3 and needs ' 

includm^taical assi^? 8 ° f Tltle I 

Program imp^ “ ses °f evalutf;' Particularly 
local evaw* nt and the!^ for loca l 

tlon capacity atrengthening of 


4 

Using Evaluation Results 


A frequently voiced statement about evaluation is that 
evaluation findings are rarely used. Often this type of 
statement is followed by the criticism that few policies 
have been changed and few programs either terminated or 
started because of the findings from evaluation. 
Implicit in this criticism is a belief that "utilization" 
means direct and often immediate incorporation into 
policy and program. The criticism carries weight mainly 
for those who have a definition of utilization that comes 
close to making it a substitute for the political 
process. We do not take that position. In our view, 
utilization takes on a variety of forms, not all of them 
immediately evident. 

Indeed, we maintain that the main goal that evaluation 
can rightfully espouse is that of being "useful": that 
is r evaluation-based knowledge is disseminated to those 
audiences that have a need or an interest in it, is 
presented in a fashion that is understandable to them, 
and is addressed to the policy questions that are 
relevant to them. Evaluation cannot and should not 
substitute for the political process. Nor can evaluators 
ensure that evaluations are used. The best one can do is 
to make sure that evaluation findings are available to 
those who might want them and that the findings address 
the issues of concern in an understandable and 
responsible way. 

Because much of the difficulty with utilization 
centers around the differing meanings of that term, in 
the first two sections of this chapter we discuss the 
var ieties of utilization and some of the limitations that 
constrain the use of evaluation findings. Next we 
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because ic is xargexy unaer tne control or evaluators ana 
sponsors, can be improved by self-conscious efforts. 
Improvements in dissemination strategies can usually be 
made that, other things being equal, ought to lead to 
greater utilization and to change when indicated. But 
other things are generally not equal: the forces and 
events impinging on decisions about programs may be more 
powerful than evidence from evaluation activities. 
Moreover, such evidence is often couched in statistical 
terms that are not translated into terms having 
substantive meaning or that may not be substantively 
s ignificant. 2 Steps can be taken to ensure wide and 
effective spread of information and thereby improve the 
likelihood of utilization, but we know of no means that 
can ensure utilization, let alone change. 


Forms of Utilization 

There is currently a very strong emphasis on using the 
results of evaluation for making specific decisions at a 
given time; for example, when legislative or budgetary 
decisions are anticipated or when changes in program 
regulation or management are being considered. 

Sometimes, this perspective is appropriate, as was the 
case for the NIE compensatory education study, which 
began with some specific issues and fairly well-defined 
problems (National Institute of Education 1976) and chose 
to investigate factors that could be controlled through 
changes in policy (Hill 1980) . The desire of those who 
initiate and pay for evaluations (Congress, the 
Department, state and local governments) to obtain 
immediately applicable results is understandable, but it 
can lead to inappropriate expectations. 

In particular, the grounds for decisions cannot always 
be specified beforehand. For example, funding decisions 
are sometimes declared to be the policy questions that 
the results of evaluations are to address. Yet funding 
decisions are generally made on a variety of grounds, 
many of which cannot be addressed by evaluations, as has 
been amply demonstrated by the history of impact aid. 



identify the most effective model for wide-scaxe 
implementation (Elmore 1975). It turned out. However, 
that there was more variation within models than between 
models; moreover, increased funding to permit increases 
in the program never materialized. 

The possible decision issues also change over time in 
unpredictable ways. Turnover among federal executives is 
high. 3 Questions that are tied to the perspectives of 
an individual decision maker or of a particular 
administration may no longer be of interest when a new 
executive or administration takes over. Decisions also 
change as educational priorities change over two or three 
years, even under the same administration. 

In short, while evaluation for specific decisions 
appears to be a sensible strategy to follow, such a 
strategy may be much wasted effort. The issues involved 
in a decision that is to be taken at some time in the 
future are not easily predicted. Hence an evaluation 
started today that is directed towards the specific 
decisions envisaged two years hence is just as likely as 
not to miss the mark because the issues in the decisions 
will have changed. 

One implication of the above is that evaluations 
should seek out questions of lasting significance and 
provide knowledge that can be used and reused, knowledge 
that may be exploited in several different ways over time 
in addition to furnishing short-term information 
(Cheliasky 1977). Involved here are differences in types 
of knowledge application, i.e., knowledge for 
understanding versus knowledge for immediate action , 
sometimes also referred to as conceptual use (indirect 
impact on decision perspectives) versus instrumental use 
(direct, mechanical application) (Weiss 1977). To ensure 
the maximum utility of any major evaluation, it should 
address questions appropriate to both uses. Adopting 
this principle has consequences for the planning of 
evaluations (see Recommendation D-lo, below). 

A third use of evaluation can be called 
iegitimization 1 the primary purpose of the evaluation is 
something other than to develop knowledge about a 


overtly acknowledged, the use of information that results 
from such evaluation studies is not necessarily 
illegitimate provided valid data are reported and 
interpreted honestly. 


Misuse and Deliberate Nonuse 

One of the problems in defining the process of 
utilization is that not all study results ought to be 
used and that deliberate rejection or nonuse of results 
that are faulty or otherwise inapplicable is preferable 
to misuse. Misapplication of results is as much a 
negative consequence of evaluation as lack of 
application, and deliberate nonuse may represent rational 
decision making as much as does appropriate 
application. 4 The problem is that the deliberate 
nonuse after results have been carefully considered and 
dismissed for valid reasons is difficult to distinguish 
from the failure to use evaluation results for other 
reasons. 

Aside from nonuse for valid reasons, it is important 
to distinguish between the misuse or nonuse that results 
from of lack of judgment and that which has as its 
motivation the suppression of valid information. Persons 
who may not be fully aware of the standards of quality 
that should be applied to evaluation studies may hail the 
results of faulted work and condemn on seemingly 
technical grounds quite well-executed studies. This lack 
of judgment calls for attempts to inform potential users 
of the standards by which various types of studies should 
be judged. The recommendations made elsewhere in this 
report on open and systematic review of evaluation 
studies should be helpful in judging quality. (Our 
recommendations on training in Chapters 3 and 5 are also 
intended to address this problem.) 

Deliberate misuse or nonuse of evaluation studies is 
in many ways more difficult to deal with. First, it is 
difficult to detect motives. Second, it is not likely 
that persons deliberately abusing evaluation studies 
would be likely to be dissuaded by arguments based on 


LIMITATIONS THAT CONSTRAIN USE 


Just as the definitions related to utilization are 
important to understand if one wants to improve the 
utilization process, so are the functions of knowledge 
within any agency or for individual decision makers# at 
whatever level.^ Evaluation cannot and should not 
replace the political process. This means that an 
automatic translation of evaluation findings into policy 
decisions is neither desirable nor to be expected. 

Policy makers cannot override the ideological# political, 
and financial limits they face# though these limits are 
themselves subject to change over time# aided by the 
accumulation of knowledge. Decision makers and managers 
are not always able to take actions that seem to the 
researcher the "best" form of intervention or 
implementation. Both the feasibility and the 
acceptability of a change in public policy are as 
critical as science-based knowledge in determining the 
course of a decision (Ezrahi 1978). Thus a program that 
is feasible and effective but likely to arouse the 
resistance of significant constituencies# or that can be 
funded only at the expense of some other more desirable 


program# or that is liable to antagonize school 
administrators or teachers# is not likely to be adopted. 
Nor should it be# given that legislatures and public 
officials are expected to be responsive to such 
realities. There is no special democratic license given 
to the results of evaluation that allows such results to 
override the ordinary political considerations that 
surround education just as they surround other important 
areas of social policy. 
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programs are made as if there were sovereign rulers in 
government. Yet evaluation reports are often written as 
if such individuals existed and were able and ready to 
act on evaluation findings and recommendations. As we 
noted above, the persons who initially ordered and 
collaborated in planning evaluations and their 
utilization may have moved on to other responsibilities 
by the time findings are available. Their successors 
often have less interest in or less understanding of the 
purpose of the evaluation. In addition, interests 
sometimes shift rapidly at the top echelons of government. 

Having some documentation of the purpose and 
importance of a study that can be referred to after the 
authority for decisions has changed would help in 
utilization. However, as has become evident from 
research on organizations (see, for example, Cohen and 
March 1974, Cohen et al. 1972), policy is often not 
"made"; rather, it accumulates by slow accretion. New 
information may actually slow down the process since it 
may make decisions more complicated. Thus, one has to 
think of policy formulation and decision making as 
involving different stages, different people, and a 
process of absorbing and digesting all types of 
information: tested empirical findings from evaluations 
are only one of those types. 

While the reduction of ignorance may always be 
desirable, it is not synonymous with the reduction of 
risk. In fact, new information may produce considerable 
risks as it enters an organization, perturbations go 
through the organization—established assumptions and 
ways of doing things become threatened, agenda priorities 
and budget line items may be thrown into question, and so 
forth. The common response to such threats is to let 
procedure take precedence over substance and to ignore 
the message of the new information in the interest of 
preserving established procedures and structures. To the 
outsider, it may appear that the information is ignored, 
though it may be used informally. Studies carried out on 
the use of knowledge among upper-level federal officials 
in the United States and abroad show that the control of 
information is more important than its use (Caplan 























he use of social-science-based knowledge by federal 
xecutives was to increase administrative efficiency and 
rganizational control. The use of results from program 
ffect studies has been more difficult to discern, and 
ven when such studies are cited, it is not the findings 
n effects, but those on coverage and management that are 
sed. The evaluation study of the bilingual education 
rogram provides a good example (Danoff 1978). 

Fourth, a continuing problem in relation to 
fcilization is the failure to spell out the ways in which 
he information developed by a study could be applied, 
hat policy options appear preferable to reach certain 
oals? What management strategies deliver services 
ffectively? What are the outcomes of different 
urricula in different types of classrooms, for different 
ypes of students? when evaluation studies address 
uestions not perceived as important by a particular 
udience, they are likely to consider the results 
rrelevant and useless. For example, a number of local 
ites have reported that the data required by the federal 
overnment on Title I and other education programs are 
ot useful to the local agency (David 1978), while others 
consider such data useful but needing to be augmented by 
pecific local studies in order to gauge program progress 
Boruch, Leviton, Cordray, and Pion, Ch. 6 in Boruch and 
ordray 1980). 

Fifth, there has been little attempt to specifically 
each audiences concerned with equal educational 
pportunity. Women, minorities, and handicapped people 
enerally believe they have limited access to social 
cience research and evaluation processes that they see 
s affecting programs that are significant to them, 
ecause of this perception of exclusion, some of the 
argest groups involved in equal opportunity issues, such 
s the NAACP, ASPIRA, COSSMHO, the National Urban League, 
nd the National Council of La Raza, are developing their 
wn capability for research and development or have begun 
o work closely with research organizations willing and 
apable of addressing issues of interest to minority 
roups. The Council for Exceptional Children performs a 
imilar function for programs serving handicapped 



Corporation 1976) and the Title VII bilingual education 
study (Danoff 1978) . (Citations in congressional 
documents of these studies and other documented uses are 
given in Boruch, Leviton, Cordray, and Pion, Ch. 6 in 
Boruch and Cordray (1980).) In Chapter 5 on the 
organization and management of evaluation activities r we 
make some recommendations pertinent to increasing the 
relevance of evaluation studies. Timeliness in 
particular and current impediments to completing studies 
on time are treated at some length in Chapter 5 (and also 
in Chapter 3). We reiterate the need for quick-response 
evaluation capability on part of the Department, as well 
as sophisticated planning of major evaluation tasks that 
will yield at least some useful results at the time they 
are needed by primary decision makers in Congress or at 
the top levels of the Department. 


Communication of the Information 

The many factors that have been identified in the 
literature as enhancing the transfer of knowledge and its 
use can be grouped under two headings: communicability 
and linkage. Communicability encompasses matching the 
style of communication used by the researcher or other 
transfer agent (see below) to that of the primary 
audience(s). Since researchers are not necessarily the 
most effective communicators, nor will they always be on 
call when needed, linkage by means of transfer agents is 
necessary. 

Several principles about communicability have emerged 
from the literature and successful practice: 

• Intelligible reports. Reports to primary 
audiences should be tailored as much as possible to their 
needs and their situation (Patton et al. 1977). Language 
should be understandable and situationally applicable; 
e.g., papers and reports written for scholary audiences 
are rarely appropriate for the primary or other 
audiences. Too often, social science researchers write 


properties ana tne niceties or tne statistical analyses. 
Reports should avoid jargon, be written in plain English, 
and address in a straightforward manner the issues 
relevant to the intended users and their informational 
needs. If a number of different audiences have primary 
interests, several versions (or translations) of a report 
may be necessary. 

• Accentuating the positive. Whenever possible, 
recommendations ought to highlight positive action steps 
that can be taken. Things not to do are important to 
recognize as well, but they rarely carry the same kind of 
reward for individuals in a position to act. 

• Live communication. The print medium is not the 
only nor even the most effective means of communication. 
Face-to-face interaction and reporting through 
conferences provide alternative mechanisms. This allows 
clarifying questions and making sure that the most 
important points are covered. Information is more likely 
to be used when it comes from sources that are trusted, 
and human beings trust other human beings whom they have 
found to be reliable in the past more than they trust a 
computer terminal. Redundancy of communication has 
proven effective, so that optimal dissemination 
strategies are likely to include both oral and written 
communication. 

As we noted above, linkage's the term used to cover 
the gap that may exist between researchers and the 
audiences for their findings. Techniques to create 
linkage derive from research on communication and the 
spread of innovation (Katz and Lazarsfeld 1955, Rogers 
1962). Lippitt (1965) and Havelock and Lingwood (1973) 
single it out as the most critical step. The issue is 
not just mechanisms of knowledge transfer, but 
information management, storage, retrieval, and knowledge 
synthesis. Past RD&D (research, development, and 
diffusion) efforts by the Office of Education were 
premised on the assumption that knowledge transfer and 
linkage through organizational arrangements would be 
effective, but the example of the Congressional Reference 
Service shows the importance of people who act as the 



TOWARD INCREASED UTILIZATION 


The preceding sections have attempted to define various 
types of knowledge use, discussed the setting or context 
for use, and briefly reviewed the evidence on the degree 
of use. Before considering what might be done to 
increase the use of evaluation results, we summarize what 
has been learned about the utilization of research 
knowledge in general. The research literature is replete 
with recommendations on how to improve the likelihood 
that knowledge will get transferred from producer to user 
and actually used (see, for example, Havelock 1969, Davis 
1973, Glaser 1973, Havelock and Lingwood 1973, Rogers and 
Shoemaker 1971, Zaltman et al. 1973). Those 
recommendations tend to cluster around two sets of 
factors: the nature of the information and how it is 
communicated. 


Nature of the Information 

The ways in which knowledge is produced and is perceived 
by its potential audience(s) affect its use. The 
aportant characteristics of knowledge associated with 
incr.aaed likelihood of use oan be summarized as 
intuitive correctness, objectivity, and relevance (Caplan 
. v ous ^' there is not much that researchers can 

"°" ledqe that fits the first 
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objectivity. 9 ^ *** affect the researcher's 



or program managers must negotiate what issues and 
information needs can be addressed in terms of 
researchable questions and what types of data it will be 
possible to collect at program sites. Such negotiation 
is not a one-time-only task; it should proceed throughout 
the evaluation so that the study is not stymied or does 
not turn out to be irrelevant. 

• Appropriate research forms. Insofar as 
methodological limitations allow, the research should aim 
to use the policy maker's or primary user's definition of 
the problem. Researchers too often tend to define the 
research to fit methodologies rather than the interests 
of the likely audience. The law of instruments has a way 
of taking over: that which can be measured is measured, 
whether or not it addresses objectives or concerns of 
interest to the policy makers or program managers. 

• Realism. The research questions addressed and 

the interpretation of results must deal with options that 
are realistic for the decision makers expected to take 
action. The variables under study should be ones that 
are politically malleable: that is, they can be changed, 
if necessary, in order to improve policy or program 
substance. For example, periods of reading instruction 
can be lengthened, but a 1:1 student/teacher ratio, even 
if effective in teaching reading, is unrealistic on a 
wide scale because of its cost, implications and 
recommendations must take into account the constraints of 
likely users, such as political acceptability or budget 
limitations. 

• Timeliness. It is especially critical for direct 
knowledge application that information be timely. If a 
study is to provide input to legislative or funding 
decisions, but is not geared to the authorization 
calendar or the budget cycle, it will be irrelevant to 
the primary audience(s). While what may be relevant 
today may not be relevant tomorrow, increased contact 
among parties at interest and evaluators will improve the 
probability that relevant questions will be addressed. 

Attention to these elements was a major factor in the 
success of the NIE compensatory education study (Hill 





separate dissemination component) or by parties external i 
to either the research or the user communities. 

Some important factors that affect linkage include: : 

• Responsiveness to differences - Transfer agents 
or groups must be responsive to differences between 
researcher and audience and to differences among 
audiences—perspectives, values, motivation, and 
language. They must know how to translate from one to 
the other and when direct interaction should take place 
and when not. (For example, some researchers make 
excellent congressional witnesses, others—equally 
eminent in their field—do not.) 

• Mediating problem definitions . Even at the 
beginning and during the course of a study, transfer 
agents can be useful because—speaking the language of 
both the researcher and the audience—they can help 
define policy decision problems in researchable terms . 

This role can be especially important when the intended 
user is not the immediate sponsor of the evaluation and 
therefore does not have automatic contact with the 
researcher. Problem definitions and criteria used by 
those requesting an evaluation must be understood by the 
researcher and be a guide to what will be done in a 
study. They must also be clarified so as to be 
researchable, or the reasons they are not must be 
conveyed to those requesting the evaluation* (As we 
noted in Chapter 2, examples of unresearchable problems 
are the measurement of effects for diffuse or broad-aim 
programs for which objectives cannot be specified, the 
measurement of the aggregate effects of a program that 
takes different forms in thousands of different locales, 
or the effects of weak treatments administered in complex 
settings.) 

• Human agents. Linkage is best achieved by people 
rather than by coid-termina! (computerized) systems, 
although this may change as the computer culture becomes 
more pervasive and terminals become more accessible in 
location and in language, at present, however, decision 
makers are still used to face-to-face communication for 



Dureaucracies .look ror mrormacion cnat comes rrom cne 
inside and find it more credible. This characteristic is 
also true of other people in the evaluation process, such 
as the various interest groups. For example, teachers 
tend to consult other teachers and their professional 
associations when they need information; groups 
representing minority interests have set up their own 
research components. It also applies to knowledge 
producers, i.e., researchers, particularly those who are 
university-based and are not dependent for their 
livelihood on communicating with potential sponsors of 
evaluations. Transfer agents can help make all these 
groups more aware of outside information. But to go 
beyond awareness and expect linking or transfer agents to 
increase responsiveness to information would require them 
to understand the function of information in each group 
and the risks that the use of information entails for 
each. 9 Transfer agents are not likely to be able to 
counteract behavior based on maintaining cherished 
assumptions or well-established procedures and that 
therefore has a need to ignore perturbing research 
findings. 


Recommendation D-9. The Department of Education should 
test various mechanisms for providing linkage between 
evaluators and potential users . 

The Department might consider establishing a unit 
charged with studying, developing, and instituting 
knowledge transfer mechanisms and evaluating their 
effectiveness. Alternatively, outside experts might be 
charged with this responsibility. Appropriate activities 
of a linkage unit, whether within or outside the 
Department, would include: 

• Helping assess proposed dissemination plans for 
evaluation studies and suggesting improvements; 

• Performing needed translations of evaluation 
reports so that they can be understood by the intended 
audiences; 





on a continuing basis; and , 1irp . _ nd 

. Developing and installing regular proce the 

institutionalized arrangements designed to facili 
use of evaluation data on a day-to-day basis, at le 
within the Department. 


AUDIENCES FOR EVALUATION FINDINGS 

If the main purpose of evaluations is to help develop 
more effective policy and improve education programs, w o 
are the audiences that are likely to use evaluation 
results in this way? What kinds of information do they 
need? And how can evaluation planning be improved to 
better serve those needs? 

Conventionally, evaluations at the national level have 
been considered relevant to two primary audiences: 
policy makers in Congress and in the federal agency 
{i.e., the Department of Education) and federal program 
managers. In this simple view, policy makers would use 
the findings from evaluations to determine present and 
future program needs and directions, and managers would 
have a tool by which to improve the delivery of services 
mandated in programs. As evaluation results have become 
visible, however, it turned out that they have also 
served as ammunition for critics of controversial 
programs or as support for a program's advocates. 

Federal legislators, convinced of the importance of local 
decision making in education, have also been concerned 
with local use of evaluation results to improve programs 
within the local school system. 

Empirical evidence from studies of the use of 
evaluations (e.g., Boruch, Levi ton, Cordray, and Pion , 

Ch. 6 In Boruch and Cordray 1980, Brickell 1974, Aik in et 
al. 1979) has shown that not all of those audiences can 
be served by any single overall study. The information 
needs of diverse audiences with varying and sometimes 
conflicting interests and perspectives make it virtually 
impossible for one evaluation study to satisfy them all. 
FoUcy makers may be mainly interested in coverage 



Reisner 1980). The major evaluation strategy used since 
the inception of this program has been collection of data 
at the local level that, through aggregation at the state 
and national levels, was to serve the information needs 
of all three levels of government. The result has been 
the generation of large quantities of data that have not 
been useful at either the local or the national level—a 
costly and frustrating process leaving all parties 
dissatisfied. The failure of Title I evaluations has 
been blamed on the lack of competence at the local level 
to collect data that can be aggregated. While the 
competence of some local evaluation units may be an 
issue, the history of Title I evaluations illustrates a 
much deeper problem, namely, the confusion of evaluation 
purposes. The original intent of the congressionally 
mandated local evaluations was to serve the needs of a 
local audience, defined by some to be the parents of poor 
children and by others to be the local school 
administrators and teachers. Later demands for assessing 
the overall effects of Title I spawned a complicated 
system of aggregating from the local to the state level 
and from the state to the national level. When it turned 
out that data emanating from thousands of different 
sources proved noncomparable. Congress mandated technical 
assistance to the local systems to help with procedures, 
designs, measures used, and problems encountered at the 
local level. Models for evaluation designs were 
developed and the technical assistance centers were 
created to instruct local evaluators in proper use of the 
models. Yearly costs for this assistance system now 
stand at $12 million, more than half the budget of the 
central evaluation unit. And yet complaints about the 
utility of Title I evaluation information continue. 

Local school systems find the data they are required to 
collect by federal directive of little use to them and, 
if they have the resources and the competence, they 
conduct their own program improvement studies. At the 
national level. Congress has consistently expressed its 
dissatisfaction with the information it receives, as 
evidenced by the rewriting of the evaluation requirements 
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to its audience, citing specific changes in law and 
regulations in six major program areas directly traceable 
to study findings. Much of the success of this study as 
contrasted to all the other Title X evaluations is 
explained by its director (Hill 1980) as due to the 
extensive consultation with the primary audience. 

Congress. 

To increase the probability that results will be used, 
the plans for an evaluation should spell out who the 
primary audiences are likely to be and how it is planned 
to reach them, so that both the substantive issues and 
the dissemination strategies can be negotiated with 
them. However, there will often be a number of secondary 
audiences. For example, an evaluation concerned with 
testing alternative curricula in career education to 
facilitate local choice may also affect the regulations 
governing federally supported vocational education 
programs. For evaluations conducted at the national 
level, decision makers (within the agency and Congress) 
and managers at the federal level are likely to take 
precedence. But where federal funds are made available 
for state and local evaluations, needs at those levels 
should be served. 


The Role of Planning 

Although planning does not necessarily lead to an agenda 
that is subsequently carried out in detail, the act of 
planning always leads to an improved sense of priorities, 
provides a forum in which competing interests can reach 
accommodations, and induces an active as opposed to a 
reactive stance toward essential activities. 


appropriate mtormation for the predictably recurring 
legislative cycles on education programs. This entails a 
standard sequence of studies—timed to be available for 
reauthorization and appropriation hearings—that will 
furnish information on the coverage of programs, 
descriptions of how they are run, and a synthesis of 
information available at any given time of what can be 
said about their effects. Second, there must be an 
ongoing program of evaluation studies carried out at the 
deliberative pace required to adddress problems that are 
poorly understood. Third, the Department must have the 
ability to respond to interesting questions that arise as 
a result of ongoing research, changes in policy, or 
development of new programs. 

In the past, the central evaluation unit of the 
Department has concentrated resources on massive studies, 
in part because such studies require fewer procurement 
actions to allocate available funds. But big studies 
invariably take longer than anticipated and become highly 
inflexible; hence they often end up addressing matters of 
tangential interest to the audience at hand when they are 
finally completed. Any evaluation plan for a major 
education program should contain a series of linked 
studies, some of which furnish factual information that 
can be obtained in reasonably short time and some of 
which address issues of long-term interest. Thus, at any 
particular time and especially at predictably recurring 
decision stages, one or more additional sets of findings 
about a program will be available. Additionally, the 
value of the whole evaluation plan does not depend on the 
success or failure of a single massive study or on the 
performance of a single contractor; there will always be 
some useful studies resulting from the overall plan, even 
though some may not turn out as hoped. In addition to 
the plan for the NIE study of Title I, examples of such 
evaluation planning are the original plan to evaluate the 
Education for All Handicapped Children Act (U.S. 
Department of Health, Education, and Welfare n.d.) and 
the Department's new evaluation plan for Title I of ESEA 
developed in 1979 (U.S. Department of Health, Education, 
and Welfare 1979c) . 11 The Committee applauds the 



outlined through the type of evaluability assessment 
described in Chapter 2 or through some similar process. 

The absence of a reasonable planning system in the 
Department has had two deleterious consequences. 

First, it has given rise to an emphasis on activities for 
"putting out the fire"—projects done in response to an 
immediate crisis because no suitable information was at 
hand when the question arose. Not infrequently, such 
projects are irrelevant by the time they are completed, 
either because the crisis has subsided or a different one 
has arisen and attention has shifted. The emphasis on 
addressing immediate concerns has reduced the 
Department's ability to evaluate programs on a recurrent 
basis in a fashion that would cumulate evidence on their 
implementation and effectiveness over time. Studies to 
develop and test out more effective program alternatives 
receive even shorter shrift. 

The second effect of the absence of appropriate 
planning has been to create yearly uncertainty, beyond 
that created by the budget process, about what studies 
the Department will undertake. When yearly planning is 
not set in the context of approved ongoing plans, the 
approval process takes longer than necessary and may be 
subject to capricious and arbitrary decisions. The 
history of fiscal 1980, when it took 6-9 months to obtain 
approval for initiating a study, provides a vivid example. 


Recommendation D-ll. The Department of Education should 
establish a quick-response capability to address critical 
but unanticipated evaluation questions . 

No matter how flexible the planning system, there will 
be a continuing need to respond quickly (within a 2- to 
6-month time frame) to evaluation-related questions that 
come from the Congress or from top-level Department 
officials. Department staff charged with evaluation 
responsibilities must be in a position to deal with such 
requests. In some areas, in-house expertise may exist, 
but even under the best of circumstances such expertise 


through the RFQ process, can be awarded small contracts 
within days for work that is limited in scope and time. 
This mechanism in the form of basic ordering agreements 
has been used by the Assistant Secretary for Planning and 
Evaluation (ASPE) in the former HEW; the dollar limit on 
contracts was $60,000. 

• Highly gualified selected organizations can be 
awarded contracts that pay for a given number of 
person-hours of effort, with tasks to be specified as the 
need arises. This mechanism has been used in the 
Department of Labor, with the limit for any one-year 
contract set at $200,000. 

• 8-A contracts and awards to SBA-eligible firms 
can usually be executed more guickly than other types of 
contracts. 

In order to be fully responsive to the information needs 
of its primary audiences, the Department must be able to 
combine a deliberative planning process that allows time 
for field and constituency involvement with a 
guick-response capability that can address unanticipated 
but critical evaluation guestions as they arise. 

The need to serve short-term information requests can 
be considerably enhanced in any program by the 
development of good management information systems. 

Thus, for example, if a good management information 
system had been in place, it should have been possible 
for the Spanish/English bilingual education program 
(Title VII) to have provided Congress with detailed 
information on the ethnicity and language status of the 
students being served, instead, a study intended to 
assess the impact of the program had to use a 
considerable share of its resources for documenting 
program coverage (Danoff 1978). Similarly, such 
guestions as the trends in composition over time of 
students enrolled in education courses in colleges and 
universities ought to be routinely collected as useful 
and necessary background data on the future supply (over 
or under) of teachers. 

For many programs that are not funded through the 
Department, the provision of such management information 



trying to cover all contingencies. we *: 

5 below, grantee reports have too often been collec e 
without ever being reviewed. 

AUDIENCES FOR EVALUATION FINDINGS 

The discussion of different audiences for evaluation 
results that follows tries to indicate different 
information needs for each. Two facts should be noted: 
there are important distinctions within broad classes o 
potential users or audiences, and sponsors are sometimes 
but not always synonymous with primary audiences. The 
latter fact means that the process of negotiating 
research questions and other substantive issues may have 
to involve a number of parties. 


Primary Audiences for National Evaluations 
Executive Policy Staff 

This category includes individuals with authority over 
resource allocations and the design of programs, most 
importantly, senior-level agency officials and their 
analytical staffs and budget examiners in the Office of 
Management and Budget (OMB). it is rare, if ever, that 
these officials are waiting for evaluation study results 
in order to make up their minds on what policies to 
pursue or what programs to fund. The weight of an 
evaluation may be slight in comparison to the 
constellation of interests and other reasons for deciding 
one way or another, even in ways counter indicated by an 
evaluation study. 

The temptations to misuse or not use the results of 
evaluation studies are all too clear; hence the 
importance we place in this and other chapters on the 
obligation of evaluators to release findings 
independently of executive decision makers. These 
temptations are also the reason (as we indicated in 


generally not in the "loop" of people who normally 
receive evaluation reports, so their information needs 
may be served inadequately, in addition, turnover of 
top-level agency officials in education has aggravated 
the problem of loss of information and institutional 
memory. On the other hand, agency officials have the 
advantage of being able to draw on their policy and 
evaluation staffs, who are probably the most consistent 
users of evaluation data while also being the likely 
immediate sponsors of evaluations. 

The potentially short life of evaluation findings, 
even though the knowledge might be useful at a later time 
and in a different context, means that dissemination 
should not be just a one-time effort. Archived 
evaluation studies that are difficult to obtain and whose 
existence is difficult to determine are useless. Hence 
some attention should be given to the problem of 
re-dissemination of evaluation findings, perhaps in the 
form of summaries or reviews of past evaluation findings 
for executive-level officials as programs and policies 
come up for review. 


Congressional Policy Makers 

It is a mistake not to differentiate among congressional 
users of information. Rarely are members of Congress 
direct and immediate audiences. Rather, the initial 
contacts are more often with the Congressional Research 
Service (CRS) staff, committee staff, or personal staff 
of members of Congress. In addition, staff of the 
Congressional Budget Office and of GAO are frequently 
prime audiences for evaluation studies. CRS, as part of 
the Library of Congress, functions as a quick reference 
service for both members and committees of Congress; GAO 
carries out special studies at the behest of 
Congress.^ Congressional staff themselves differ in 
their use of evaluation information: senior staff of 
committees are generally better informed users of 
evaluation results than personal staff of individual 



proof, perhaps, of the fact that budgetary decisions 
often are not heavily influenced by the results of 
program evaluation. 

It is relatively easy to document the explicit use of 
evaluation studies by Congress and its staff: who makes 
what information requests and receives responses from 
CRS, who has received copies of evaluation studies, and 
who refers explicitly to those studies in committee 
reports and in the published remarks of members of 
Congress. But there is also a more informal and diffuse 
infiltration of information into congressional discourse 
that is much more difficult to trace because it leaves no 
explicit markers. Thus, a Congresswoman who remarks on 
the floor that a particular program is working well may 
mean that she has talked to a school principal in her 
district who assured her that without the program his 
schools would be suffering, or she may mean that she has 
received a memo from one of her staff who had summarized 
an evaluation report from the Department of Education, or 
she may be referring to an assessment from GAO, or she 
may merely be expressing her own opinion based upon 
whether or not the program is "in line" with the kinds of 
things she usually supports. We suspect, along with 
others, that this informal, diffuse use of evaluation 
results may be the most important use of all, but it is 
not something for which one can readily provide direct 
documentation. 


Federal Program Managers 

Program managers are likely to be interested in 
information that can improve delivery of educational 
services at the local levels, since they are often 
already committed to a 9 iven program, effectiveness 
information may seem irrelevant to them except insofar as 
it enhances support for the program. On the other hand, 
“° r “^ion on how programs are being implemented and 
r ices are being provided to what beneficiar ies 
can lead to improvement in program regulation and 


Hidings or process evaluations are too disruptive of 
stablished procedures, they are not likely to be 
nplemented. 


scommendation D-12. The Department of Education should 
isure that evaluations deal with topics that are 
“levant to the likely users . (See Recommendations c-1 
id D-l.) 

As discussed earlier, relevance is not easy to 
ihieve, but it is relatively easy to specify procedures 
bat will make it more likely. Such procedures include: 

• Primary audience(s) must be specified from the 
sginning of the study. 

• Arrangements must be made to facilitate 
immunication between evaluators and intended users at 

re inception of a study and throughout its course. This 
11 help ensure the fidelity of the evaluation to the 
lestions of interest to the identified audience(s) and 
11 also help obtain commitment and interest on their 
art. Current administrative restrictions that inhibit 
hat kind of communication should be removed. 

• When the goal of an evaluation is to provide 
iformation for decisions at specified times, such as the 
“authorization of programs or annual program 
ppropriations, reports must be delivered on time, if a 
tudy has been delayed, its abortion should be considered 
nless some aspects will address longer-range concerns. 

• Evaluation monitors should be charged with the 
esponsibility of including in their routine monitoring 
iformation about events and changes that carry 
triplications for the usability of findings, changes in 
valuation design or methodology are sometimes made in 
esponse to field conditions, budgetary and clearance 
onstraints, or for other reasons. Such changes may have 
ufficient impact on a study so that the research 
uestions framed to be relevant to the identified 
udience(s) can no longer be addressed adequately, 
hanges in the conduct of an evaluation that have such 
mpact on the possibility of utilization should suggest 

thinking the objectives of the evaluation or 
erminating it altogether. 



address the information needs of a nonfederal audience, 
for example, representatives of minority and other 
beneficiary groups. For studies initiated by or at the 
behest of any of these other audiences, our 
classification of primary and secondary audiences would, 
of course, be reversed. 


State and Local Agencies: Central Staff 

The distinctions made at the federal level among decision 
makers, evaluation (and other analytical) staff, and 
program managers are also important at the state and 
local levels. The motivations and general information 
needs of the staffs are analogous, but focused on the 
program as it operates in the local setting. Since the 
policy variables that can be altered by state and local 
administrators are considerably different from those that 
can be altered by federal staff and Congress, evaluations 
must address different questions. Similarly, program 
management at the federal level entails quite different 
responsibilities from program management at the state and 
local levels, and process evaluations that are intended 
to improve management must be sensitive to these 
differences. 


Local Agencies: Principals and Teachers 

The individuals who actually provide the educational 
services intended by a program (and their 
representatives, such as the National Education 

^ (NEA> ' Anerican Federation of Teachers 
(AFT), and associations representing school principals) 
can become a powerful constituency f or or against a 

stStl^h? 8 beei ? demonstrated by the history of Head 
Start and the experiments with voucher programs. 

- offer help 


select whom they will teach. However, demonstrating 
differential effects of alternative program strategies 
may be helpful, since teachers can select the strategy 
most appropriate to their school situation and students. 


Program Clients and Their Representatives 

The ultimate targets of education programs are students. 
Since much of the investment in federal education 
programs is at the elementary level, obviously many of 
the beneficiaries are too young to be audiences for 
evaluation information. However, there have been 
specific attempts to address evaluations to parents so 
that they could use the results to improve their 
children's schooling. As we noted above, this was the 
explicit intent of the original Title I evaluation 
mandate (the first legislated requirement for evaluation 
in education) as originally proposed by Senator Robert 
Kennedy in 1965 (David 1978) . The objective has seldom 
been met, even when parent advice was legislated into 
later Title I amendments in the form of parent advisory 
councils. Groups other than parents also speak for the 
interests of beneficiaries, most of whom are poor, 
members of minority groups, handicapped, or otherwise the 
targets of discrimination. The interests of these 
groups, which include the major advocacy organizations 
concerned with equal opportunity and minority issues, is 
to use evaluation information to ensure that the intended 
beneficiaries are adequately reached by the programs 
intended to serve them and that those programs deliver 
effective services. 


Researchers 

The outcomes of any evaluation study will be of interest 
to other evaluators and researchers who are concerned 
with development of educational policy, with 
instructional strategies and school management, and with 
the technical issues arising in the conduct of applied 




the course of the study; 

• Specification of timetable events, e.g», 
congressional hearings, that provide occasion for 
reporting on findings; 

• Mechanisms for reviewing and revising the 
dissemination plan during the course of a study to fa ® 
account of changes in the study or in the context of tne 
work; 

• Plans for archiving reports and other 
documentation of findings so that they remain accessible, 
with a guarantee by the contractor that data will be 
clean and accessible (see Recommendation D-7); and 

• A budget commensurate with the proposed 
dissemination activities. 


Recommendation D-14. The Department of Education should 
observe the rights of any parties at interest and the 
public in general to information generated about public 

programs . 


Though minimal dissemination is concerned primarily 
with the immediate or primary audience, other people 
having an interest in the program being studied are 
likely to demand and should have access to evaluation 
findings. This raises two issues: what are the special 
rights, if any, that should be afforded the agency that 
has requested and funded an evaluation, e.g., the 
Congress, the Department, OMB, or GAO? To what degree 
should traditional authority relationships be overridden 
m order to serve the public interest, i.e., what 
obligations do evaluation units and contractors have to 
disseminate findings to potential users who are outside 
the command and report lines within tables of 
organization? 

Findings fro* evaluations must be made available to 
those who are importantly affected hv vv, available 

evaluated: for example, those who * pr ° 9cams being 
, ® who manage them# those who 

provide program services, and fhr> ea . tneni, cno&e 

benefit (or their representatives) . Yin^etaluattons 0 




evaluators should be guaranteed a certain degree of 
autonomy. 

Four steps are needed to provide improved public 
access to evaluation findings: 

• Proper safeguards for maintaining the rights to 
privacy of individuals and organizations must be applied 
before release of findings; 

• The rights of the sponsoring authority to 
exclusive access to evaluation results should be limited 
in time; 

• The right of managers and executives to restrict, 
control, or suppress evaluation findings should be 
limited in time; and 

• Reports on findings should be accompanied, when 
available, by interpretations and critiques issuing from 
the review process recommended in Chapter 3. 

Appropriate changes should be made in contract provisions 
to allow contractors and grantees the necessary 
flexibility with regard to distribution of reports and 
other dissemination strategies. 


Recommendation D-15. The Department of Education should 
give attention to the identification of "right-to-know" 
user audiences and develop strategies to meet their 
information needs . 


Perhaps the most neglected audience for evaluation 
studies consists of program beneficiaries and their 
representatives. We recognize that this neglect is not 
so much intentional as it is produced by the very real 
difficulties of defining this set of audiences in a 
reasonable way. In order to more closely approximate the 
ideal that all those having a recognized interest in a 
program should have reasonable access to evaluation 
results, the Department should consider dissemination of 
evaluation reports freely to groups and organizations 
that claim to represent major classes of beneficiaries of 
education programs. Positive, active dissemination to 
such right-to-know groups may include such specific 



the "Sesame Street" evaluation that showed that, although 
the target population—poor children—had indeed made 
gains in reading readiness, as documented by the original 
evaluations, the gap between them and more affluent 
children had actually grown because the latter made 
greater learning gains. In order to provide for 
secondary research, reports and primary data and 
publication of evaluation-related material should be 
archived in professional journals and as monographs (see 
Recommendation D-7). 


Media 

Discussions of evaluations are more likely to find their 
way into professional and trade journals if results turn 
out to be controversial. If the program being evaluated 
is itself of sufficient interest, the controversies are 
likely to be picked up by the more popular media, 
newspapers, television, and radio. Obviously, these are 
secondary audiences for evaluation results, but the way 
in which evaluators communicate with them may make a 
crucial difference in the reporting and interpretation of 
what a program is all about and what evaluation is all 
about. 


Reaching Audiences 


RggQjP>gndation D-13. The Department of Education should 
gngure.that dissemination of evaluation results achieves 
adequate coverage . 


Evaluation utilization has been assigned a high 
priority within the Department, but utilization cannot 
happen unless people have a chance to consider relevant 
information. Therefore, it is important to establish 
clearly that attention to dissemination is not a pro 
forma zeroise. Indeed, the agency must, through its 
actions, indicate as great a commitment to dissemination 


scrutiny and assessment as are evaluation designs and 
methodology. 

At the very least, evaluation results must be 
communicated (delivered) to the primary audience (s) . 

This requirement would seem self-evident, but it often is 
not met. Contract clauses routinely forbid dissemination 
before formal approval by the sponsor, which is sometimes 
withheld. As Boruch, Cordray, and Pion note (Ch. 5 in 
Boruch and Cordray 1980), this keeps some (though not 
all) evaluators from reporting on their findings. Also 
routinely, a very limited number of copies of final 
reports are printed (100 copies for most studies unless 
unusual circumstances exist), with the result that 
landmark studies like the Title VII bilingual education 
study (Danoff 1978) quickly become out of print. In some 
cases, a copy of the final report cannot even be found in 
the project files (Cook and Gruder 1979). In other 
cases, like that of the NIE compensatory education study 
(National Institute of Education 1977), a stockpile of 
copies actually exists, but it is difficult to get 
information about how to get copies. In cases of lengthy 
reports with multiple appendices, archives like ERIC 
contain only part of the material originally published. 
Restrictions on the number of copies and on archives—not 
to mention more costly dissemination strategies—are 
often imposed by contracting rather than technical agency 
staff in order to reduce budgets but without 
consideration of dissemination needs. 

All RPPs and grant announcements should include 
requirements for a dissemination plan that is oriented 
toward maximizing the likelihood of utilization. The 
evaluation of proposals should give appropriate weight to 
the quality of the dissemination mechanisms proposed. 
Budget negotiations should recognize that adequate 
dissemination is costly and cannot be an afterthought. 
Dissemination plans should include: 

• Specification of primary and secondary audiences; 

• Delineation of the different information needs of 
the specified audiences and how those needs will be 



careful consideration of the appropriate riyut * 
groups should be part of the dissemination plans that 
contractors are asked to prepare as part of their 
response to RFPs and grant announcements. 

We recognize that this recommendation makes the whole 
process of sponsoring and carrying out evaluations more 
complex, but we consider the involvement of right-to-know 
groups critical. They often perceive themselves as 
having limited access to or insignificant involvement in 
evaluation efforts that may be used for policy and 
resource allocation decisions that concern them. 
Furthermore, such groups can have an important influence 
on the improvement of educational practice, and they need 
access to information so that their recommendations and 
actions are as effective as possible, involvement of 
these audiences from the very outset of an evaluation 
enriches the public policy process both because it widens 
the universe of viewpoints and because, over the long 
terra, it can improve the quality of education insofar as 
these groups are links to the communities that the 
government is attempting to serve. If they share in the 
evaluation process from the beginning, they are more 
likely to use the findings in their spheres of influence. 


Changing User Behavior 

Recently Sechrest (1980) has suggested that, if 
high-level administrators could be trained in how 
evaluations are done and how researchers present results, 
utilization would be increased. We include suggestions 
for such training in Recommendation D-17 in Chapter 5. 

Pe have some doubt, however, that top executives or 
members of Congress have the time for such training or 
would retain technical knowledge that they would use 
infrequently. If they did develop greater facility for 
the language of evaluation, they would certainly become 
more sophisticated readers. 

It Is possible to think of incentives foe use and 
sanctions against failure to use evaluation results 


xLanned changes had been made. Some states (Rhode 
[sland, Massachusetts) do indeed require reports from 
Local school systems on the use of Title I evaluations, 
fowever, there is also some danger that such requirements 
/ill turn into additional pro forma exercises. Required 
responses and actions might also make explicit some 
conflicts between managers and analytical staff about the 
/alue of a program or the effectiveness of its management. 

Recent reforms in the federal civil service provide 
special bonuses for effective program management, and 
ippraisal of management is tied to the results of program 
evaluation (Office of Management and Budget 1979). 
iowever, the success or failure of a program is at least 
as much dependent on its design and legislative 
provisions as it is on the efforts of program managers 
and personnel, so the attempt to judge good management 
performance through program evaluation may be off target 
anless only those factors under control of the program 
nanager ate examined. A second effect of this particular 
incentive system has been to define management objectives 
Ln clearly measurable terms (e.g., items of priority mail 
answered on time) rather than in terms of the more subtle 
and less objectively measurable behaviors that are needed 
for effective program management, such as frequent and 
productive interaction with state and local staff. 

Sanctions for failure to institute changes suggested 
ay evaluation results have also been suggested, for 
example, withholding program funds until the changes are 
nade. The history of cutoff of federal funds for 
i/iolation of civil rights laws suggests that this 
particular sanction is very unlikely to be imposed. 
Consequently, we make no explicit recommendation on the 
jse of incentives or sanctions. However, the Department 
night consider requesting that federal program managers 
i/ho have had their programs evaluated prepare evaluation 
use reports. These might be prepared within one year 
Eollowing receipt of the evaluation report and contain an 
assessment of the level and types of uses made (including 
reasons for nonuse) as well as an analysis of factors 
that impeded or facilitated use. If the Department 
proceeds with such a requirement, the dissemination and 



NOTES 


The literature on putting knowledge to use has grown 
as rapidly as the evaluation field itself. Davis (in 
Human Interaction Research Institute 1976) has 
estimated that, by the mid-1970s, the research 
literature concerned with the field of knowledge 
utilization included some 20,000 citations, compared 
with 400 such citations 20 years earlier. 

For example. Marsh et al. (in press) found that 
changes in rape law had produced a statistically 
significant decrease from 12 to 10 in the average 
number of examination procedures that a rape victim 
had to undergo if she reported the crime. Obviously, 
in substantive terms of victim humiliation, one could 
hardly report this as a meaningful change. 

The average tenure of a Commissioner of Education 
during the last decade has been less than 2 years; NIE 
has had six changes of leadership in 8 years. 

We analogize from a definition by Yin et al. (1976) of 
situations regarding the adoption of innovations: 
adoption is regarded as a positive outcome if the 
innovation leads to improvement but as a negative 
outcome if it does not; failure to adopt is a negative 
outcome only if the innovation would indeed lead to 
improvement but a positive outcome if it would not. 
Bead Start teachers deciding to increase the time 
spent on prereading activities are as much decision 
makers in their realm as a superintendent installing a 
new curriculum - a state legislature passing an 
appropriation for compensatory education, or a federal 
program manager developing program regulations. 

Of course there is always a question as to who can 
represent beneficiaries. The Committee has made no 
ilT'Slv address this question in any detail, both 
for lack of time and because we did not consider 

Ke r no 1 tn h rt a ih f e led t0 d6fine SUCh representatives. 

not© that there are qt* ftime i, , 

^ groups that speak on behalf of 
specific beneficiary groups; their claims to represent 
these groups could, perhaps, be considered in the same 



but still under the auspices of the program, is no 
clear, particularly when future evaluation contracts 
from the same source are a possibility. Evaluations 
performed or sponsored by units outside a program are 
not necessarily free of bias either, whether performed 
in-house or contracted out, especially when top 
decision makers are known to favor particular points 
of view. 

8 Appropriate packaging has also been deemed important, 
but many counterexamples exist. For example, the 
attempt to develop social indicators resulted in a 
handsome publication (Office of Management and Budget 
1973, U.S. Department of Commerce 1977) with 
attractive and easy-to-read graphics, yet it has found 
limited use. 

9 As we discussed above, there are risks for 
bureaucracies of having to deal with new information. 
Other groups also run risks: for example, audiences 
concerned with equal educational opportunity may find 
negative results on programs they favor distasteful 
and disturbing. 

10 The distinction is not always clear. Sometimes, 
expectations for use at all levels are set up when 
data required at the federal level are collected by 
staff at the local level, as in the case of Title I. 

In some cases, it may be most efficient to sponsor a 
study at the federal level even when the results are 
pertinent to individuals at the local level; for 
example, testing the efficacy of alternative 
strategies for teaching reading. 

11 The national-level evaluation of ESEA is not intended 
to take the place of the three-tier evaluation of 
Title I based on local data collection and aggregation 
at the state and national levels. Rather, it is a 
substitute for previous efforts at the national level 
to study the effects of Title I, specifically, the 
sustaining effects study (Dearman and Plisko 1979, 

U.S. Department of Health, Education, and Welfare 
1979a, Baker and Ginsburg 1980). 

12 As described in Appendix A, fiscal 1980 was the first 
year for which there was a comprehensive review of 
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In Chapter 2 we discussed in general terms the different 
types of policy questions that are asked about 
established or proposed programs. In this section we 
consider what kinds of evaluations need to be carried out 
in order to address those policy questions for education 
programs. 


Accountability 

The Department is accountable for carrying out education 
laws in three respects: ensuring that moneys are 
allocated as specified, ensuring that benefits go to the 
targeted groups, and ensuring that civil rights 
provisions and service mandates are being met. 


Fiscal Accountability 

Because of the decentralization of education, the 
allocation of funds for most major programs takes place 
at all three levels of government: federal, state, and 
local. (A few programs provide for federal grants 
directly to local agencies.) Hence, all three levels 
must account for the use of federal education funds, and 
fiscal reports from local and state agencies form the 
basis for the Department’s own fiscal reports. Grantee 
reporting is checked periodically by the agency's 
inspector general. For a few titles, like vocational 
education state grants, such auditing is mandatory in 
law; for the most part, however, the Department has 
discretion as to what local and state reports and 
disbursements are audited. Nearly one-fourth ($10 
million) of all evaluation funds are spent on fiscal 
audits? generally, programs with large outlays (Title I 
of ESEA, post-secondary grant and loan programs) receive 
most attention (see Appendix A) . 

As audits have gone beyond checking for sound fiscal 
management and into checking for compliance with legal 
requirements on the use of funds, the line between fiscal 
audits and other accountability evaluations has become 



fl ve caused most school systems to provide "pullout" 
ir °grams that can be easily accounted for separately, 
Ve n though they may not be the preferred educational 
•Ption (National Institute of Education 1977). 


^countability for Beneficiary Coverage 

!r antee reports have generally served as the most 
comprehensive source of information on program 
>articipation. Though local agencies are obviously in 
:he best position to count participants, there are two 
>roblems with the use of such self-reporting: 
eliability of the reported data and lack of information 
m who is not being served. Reliability can be 
locumented through third-party checks on grantee 
eports. if grantee reporting for a specific title turns 
>ut to be highly unreliable, technical assistance on 
nterpretation of the law (e.g., defining participants 
>roperly) may be warranted; alternatively, incentives and 
sanctions that encourage misinterpretation need to be 
ixamined and adjusted to bring grantee performance and 
eporting in line with the legal intent, it is doubtful 
;hat the Department will ever be able or wish to replace 
irantee reporting on beneficiary coverage, but it must 
iccept responsibility for the accuracy of such reporting. 

How to document the number of potential beneficiaries 
iot being served is quite another matter, however, 
istablishing the universe of eligible participants falls 
inder the heading of needs assessment. The incentives 
ind disincentives for conducting accurate needs 
issessment may be strong at the local and state levels: 
;here is an incentive when having more eligible 
>articipants means getting more federal dollars; there is 
i disincentive when federal dollars are accompanied by 
latching provisions that call for greater contribution 
:rom local and state than from federal sources. At the 
iederal level, there are also strong incentives: program 
idministrators who do not want to see their programs grow 
ire rare indeedyet this responsibility is often 
issigned to a program office, as was the case in 
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of the local context and of potential program benefits. 


Accountability for Civil Rights Mandates 


Accountability for civil rights mandates takes two 
different forms in education. The first involves the 
enforcement of civil rights statutes in any way related 
to educational institutions, whether built into federal 
education legislation or decreed by federal courts, and 
is based on federal responsibilities under the 
Constitution. At the same time, the provision of 
educational services is constitutionally a state 
responsibility, delegated to local authorities. 

Enforcement of statutes relating to civil rights and 
equal educational opportunity has become the 
responsibility of the Department because it can withhold 
federal funds in the event of noncompliance. As with 
fiscal accounting, a separate office headed by the 
Assistant Secretary for Civil Rights is responsible for 
compliance, and it is not considered an evaluation 
activity per se. 

The second form of accountability arises because some 
civil rights statutes require certain kinds of 
educational services. Two groups are specifically 
covered in this manner: all handicapped children are 
entitled to a free appropriate public education under 
P.L. 94-142, and Title VII of ESEA (in accord with the 
Lau court decision) requires schools to provide 
instruction that does not put a non-English speaking 
child at a disadvantage. Such educational services that 
til ® peIled ° ut * n laws or in regulations tend to be 
based on perceptions of constitutional rights rather thar 
on social science evidence about needed services. 
Consequently, monitoring activities may overlap 
Responsibility for compliance with service mandates mav 
belong to the program office, but selective checks are 
often carried out by the Office of Civil Riaht* An 
example is the labeling and testing of hs ! A 
children.^ since these two k^T^^r to 



e< iluation planning; otherwise it leads to inefficient 
u i of resources at best and antagonism between units of 
tl t Department at worst. 


Program Implementation 

E: :ept for provisions connected with civil rights and. 
e lal educational opportunity, federal education 
1 jislation often does not spell out mandatory 
e icational interventions or treatments. The 
o istitutional delegation of responsibility makes 
d visions in education a jealously guarded right of local 
a 3 state authorities. Exceptions are such demonstration 
p >grams as Follow Through or Experience-Based Career 
E ication, in which school systems are given the choice 
o one of several specified curricula, since the 
r :ionale of demonstration programs is developing and 
t sting effective interventions, documenting the nature 
o the services provided through them ought to be an 
i tegral part of any evaluation research associated with 
t sm. There are also some ESEA titles that include 
e plicit process specifications, such as the requirement 
f r developing an individual education plan (IEP) for 
e ery handicapped child served under P.L. 94-142. In the 
c se of such mandated educational processes, especially 
t ose instituted on little evidence as to their effects, 
rr re than mere compliance checking is also needed. 

E aluation should be carried out to find out the degree 
t which such processes contribute to the overall goals 
c the legislation, for example, to provide more 
e fective education for handicapped children or—in the 
c se of bilingual education—for children whose native 
1 nguage is not English. Documentation of program 
l ocess and implementation has been carried out at all 
c vernment levels and, within the Department, by both the 
c gnizant program units and the central evaluation unit. 



has called for information on program effects. In tne 
past, the response by OE has been the commissioning by 
the central unit of large-scale impact assessments that 
consume several years and millions of dollars, as 
exemplified by the sustaining effects study carried out 
by the Systems Development Corporation (1976, Baker and 
Ginsburg 1980) . There have been several problems with 
such efforts. First, what Congress often wants and needs 
is information on effective delivery, in the sense of 
having accurate accounting for how a law is being carried 
out, as described above. Better specification of the 
questions to be answered in any legislation calling f.ot . 
assessment (as recommended in Chapter 2) would help avoid 
misdirected evaluation efforts; even more important is an 
ongoing dialogue on congressional needs between key 
congressional staff and Department staff responsible for 
evaluation. 

Second, even when assessment of program effects is 
called for, expectations of the size of those effects are 
often exaggerated because of unrealistic promises during 
the legislative and appropriation processes. But by the 
very nature of federal education programs, effect 
expectations should be modest. Whatever educational 
service is envisaged as a result of federal dollars, it 
will be delivered in a decentralized manner through some 
16,000 local school systems in the public sector 
comprising nearly 90,000 school buildings. There are 
more than 2 million teachers in the public school 
systems, and another 250,000 people are teaching the 10 
percent of students in nonpublic schools. (Private 
school students also receive benefits under Title I and 
other federal programs.) Federal programs operate at the 
margins of this huge enterprise, providing 8 percent of 
all revenue for public elementary and secondary schools 
(Dearman and Plisko 1979). Moreover, most federal 
programs are geared to specific populations; in those 
cases, support for core education, the major 
responsibility of the teacher, is expressly ruled out. 

Yet the children who receive benefits from any of the 
federal titles do not do so in isolation from the rest of 




country in school achievement of a target group or 
lessening of racial tensions—is to ignore the nature of 
the educational system in this country. 

When the effects of a given program are modest, their 
estimation is a complex, difficult, and costly task. 
Such estimation should be done only when it is likely to 
affect program decisions (for example, in the case of a 
limited experimental program) and only by the most 
competent evaluators and evaluation organizations. 


Program Planning and Improvement 

One of the Department's responsibilities is to provide 
leadership for improving education in this country; 
therefore, it ought to carry on a set of prospective 
activities designed to improve the substance of existing 
programs and to develop new programs. The relevant 
evaluation activities are summarized in Chapter 2: needs 
assessment, identification of interventions likely to 
relieve the need, small-scale testing of proposed 
programs under optimal conditions, field evaluation under 
actual operating conditions, and analysis of likely costs. 

Such a process of program planning should operate both 
at the national level and in selected states and 
localities that have the resources. A similar set of 
activities is relevant to program improvement, although 
the need and the general nature of the program may 
already be established. Too often, however, the 
exigencies of the budget process and the demands from 
those concerned with implementation of current programs 
relegate the planning of new programs and the improvement 
of established ones to a low priority. The tracing of 
benefits already legislated and the assurance that 
programs are carried out as intended take first 
priority. Development of knowledge needed to formulate 
better programs is a long-term process, with no assurance 
that the outcomes will be immedately applicable. In view 
of pressures for greater accountability and improved 
program management, it may be argued that activities 
aimed at the substance of programs should be relegated to 



evaluation and research planning. This kina° £ . ting 

coordination recognizes that, particularly for exist 

programs, program managers should be involved * 
design and testing of alternatives. They can provide 
necessary experience regarding current program 
operations, and they are likely to have ideas for 
improvement. But the overall effort should be 
hands of research-trained people whose full-time 
attention can be devoted to evaluation activities. 


Evaluation as a Management Tool 

In an examination of the use of social science 
information by federal executives, Caplan (1976) found 
that, in the Office of Education, more program evaluation 
was conducted and less of the information generated was 
actually used than in any other agency examined. It may 
be that, in its past emphasis on rigorous studies of 
program effectiveness, the central evaluation unit of the 
Department was not satisfying the information needs of 
the most powerful audiences, namely, the legislative and 
executive branch overseers. Their primary interest is m 
fiscal and beneficiary information, which provides an 
effective tool for holding managers at all 

levels—federal, state, and local—accountable for proper 
distribution of benefits. In fact, OMB circular A—117 
(Office of Management and Budget 1979) requires both 
management and program evaluation of every agency 
{including an annual report) and ties this activity 
directly to the reward system for federal managers 
included in the recent civil service reforms. 

Problems are likely to arise, however, when 
accountability demands are taken beyond ensuring that 
resources are properly allocated. Who is to be held 
accountable for program effects that will probably be 
modest and difficult to estimate? As Cronbach et al. 

( 1980 ) point out, condemnations of individuals for 
weaknesses or "failures" that occur in a system over 
which they have little control is a perversion of the use 



Dr federal program managers. This is not to argue that 
studies of program implementation and of program effects 
should not be done, only that they are unlikely to be a 
useful management tool. 

There is a second problem with using evaluations of 
program effects for trying to improve program 
management. The fear that programs will be curtailed 
Decause of negative findings is aggravated in today's 
climate of tightening budgets. Even if in the past there 
nave been few examples of established education programs 
that have been cut severely or abolished as a result of 
avaluation findings, the threat is real. Line managers 
and top officials wanting to build programs and budgets 
are not likely to cooperate enthusiastically in 
evaluations they perceive to have the potential of 
Jamaging their programs. 


CURRENT ORGANIZATION 

tow effectively is the Department now organized to carry 
cut its evaluation responsibilities? Figure 3 
illustrates the organization of the Department as of 
January 1981; Figure 4 places the central evaluation 
jnit, which carries major but not sole responsibility for 
evaluation, in its current context. 

For evaluation activities other than fiscal accounting 
and civil rights enforcement, legislation and 
administrative actions have created a hodgepodge of 
evaluation responsibilities and assignments, based more 
on the power base and history of individual programs than 
on rational planning. After an analysis of major 
education programs, Cordray, Boruch, and Pion found: 
"Programs differ markedly with respect to the number and 
types of evaluation mechanisms that are described within 
the law and by federal regulations" (in Boruch and 
Cordray 1980, Ch. 3:7). Thus, states and localities may 
or may not be charged with producing performance reports, 
doing needs assessments, and carrying out studies of 
program improvement and program effects. For some 
programs, states are supposed to monitor local programs 



FIGURE 3 Organization chart of the Department of Education 









have distributed evaluation responsibilities as much on 
the basis of the political strength of individual program 
administrators and their constituencies as on any basis 
connected with the quality or integrity of evaluations. 

There has been a central evaluation unit at the 
national level for a decade, but its responsibilities 
have varied, even as funding has increased (see Appendix 
A). After the unit was established in 1970, evaluation 
activities began to be centralized. The central unit 
acquired staff, a budget, and responsibility for national 
studies. This centralization was instrumental in 
introducing rigor, integrity, and visibility to the 
evaluation efforts mandated by Congress and sponsored by 
OE. For several years, budgets and responsibilities 
increased. But as dissatisfaction developed with the 
perceived lack of timeliness and relevance of some of the 


studies—not to mention unhappiness with some findings 
deemed potentially damaging—pressure increased for 
certain programs to be responsible for their own 
evaluation activities. At present, some programs include 
virtually no evaluation activities other than obligatory 
program monitoring; others delegate evaluations to the 
central unit; still others conduct all their own 
evaluation activities. In addition to the central unit 
and program units, evaluation activities are also carried 
on in the research unit (Assistant Secretary for Research 
and Improvement), the planning unit (Assistant Secretary 
and Budget) ' and at the Secretary's level, 
until 1979, there was no overall evaluation planning or 
coordination of evaluation. 

Congressional restiveness with the performance of this 
nonsystem led to still another layer, mandated 

unif^mF *H UdieS “ be Carried out * designated 
unit NIE in the case of studies on compensatory 

on “disc iDl f ™ Cational elation, NCES for a study 

on discipline in the schools (p.l. 93-sam 

° ffi0e in the ““ °* a st^nTchooi 



administrative reasons bears reexamination in the light 
of some reasonable criteria, such as: the type of policy 
question to be asked and the information needed; the most 
effective and efficient ways of obtaining the needed 
information; the intended use of the information 
(likelihood that use will occur may depend on how and by 
whom the information is generated); the size and nature 
of the program; and the research capacity of the unit 
considered for assignment of evaluation responsibility. 
The application of such criteria will indicate what 
changes might be made to improve the current organization 
of federally funded evaluation activities related to 
education. But since there is no one best way to 
organize these activities, the implications the Committee 
has drawn from the preceding discussion are presented 
below as suggested alternatives rather than as 
recommendations. 


Centralization Versus Decentralization 

Organizational researchers and management experts have 
debated the merits of centralized organization compared 
with those of incrementalism and mutual adjustment 
brought about through coordinative mechanisms among many 
autonomous units. Each form of organization has its 
costs as well as its benefits, central organization can 
lead to more coherent activity, but it is time-consuming 
as the decision process works up through the hierarchy 
and back down for execution. It may also seem capricious 
and arbitrary, especially in complex situations and 
situations of uncertainty. Such conditions are 
characteristic of most evaluation planning related to 
social programs. On the other hand, while decentralized 
planning and execution can come closer to satisfying 
needs of individual units at the federal, state, or local 
level, it can lead to duplication, wasteful use of scarce 
human and fiscal resources, and low quality. Attempts to 
minimize these negative consequences through purposeful 
coordination will, like other centralizing mechanisms, 
exact high costs in time. 





has overall responsibility for program evaluation, 
some evaluation activities need to be decentralized? in 
fact, present law and custom so dictate. But planning 
directives for 1980 manifested an attempt to recentralize 
evaluation activities through review and approval by the 
central unit of all evaluation plans. No parallel 
attempt is evident with respect to evaluation activities 
funded by federal funds at the state and local levels, 
except to provide technical assistance in the case of 
Title I evaluations. 


Decentralization Among Levels of Government 


As noted, evaluation requirements levied upon local and 
state agencies vary from program title to program title. 
(For summary descriptions of requirements in major 
titles, see Cordray, Boruch, and Pion, Ch. 3 in Boruch 
and Cordray 1980). Generally, reporting requirements 
appropriately emphasize the collection of information on 
beneficiaries served and on distribution of resources. 

For a number of titles, the states carry the 
responsibility of aggregating data provided by each local 
education agency. But state-level reports have seldom 
been able to make statements about how programs operate 
throughout the state as a whole, partly because local 
agencies were not reporting data of sufficient quality 
and uniformity to allow aggregation. Consequently, 
states have also acquired some responsibility for 
technical assistance. For certain titles, localities are 
also required to identify the number of individuals in 
the target population (for example, for the handicapped 
covered m P.L. 94-142). Since identification of 
individuals generally leads to the need to serve them, 
and federal funds by no means pay the total cost of 
service, there are considerable disincentives to 
comprehensive needs assessment carried out by local 


[n addition to reporting on the distribution of 
on the numbers and types of both potentia 


^oncernea wicn eaucationai etiecuveness. in many cases, 
however, major expenditures of their own funds reported 
by local agencies as evaluation of program effectiveness 
are for testing designed to track general student 
achievement rather than specific effects traceable to any 
one program. It appears to be the intent of current 
requirements that local evaluations serve auditing and 
monitoring purposes while at the same time also informing 
local program developers and administrators on the best 
implementation strategies. As illustrated by the history 
of Title I evaluations (summarized in Chapter 4), 
stipulations for local and state evaluation activities 
have shown a confusion of purpose between assessing the 
extent to which programs are providing benefits and 
mandated services and determining ways in which local 
programs might be improved. Local evaluators are forced 
to use designs and methods to collect data that can be 
aggregated at the state and national levels, but such 
data do not serve the local needs well. Moreover, those 
data have not even proved useful in providing statewide 
or nationwide overviews; separate state or national 
studies have been needed for that purpose. Though some 
data collected at the local level might serve both local 
and national purposes, each type of evaluation question 
has distinctive design and measurement requirements (as 
discussed in Chapter 2) and implies different 
relationships among the three levels of government. 

We have noted in Chapter 3 the variable quality of 
evaluation activities carried out at the local and state 
levels and have recommended that Congress consider a 
diversified strategy of evaluation requirements at these 
levels (Recommendation C-3). In Chapter 4 we discussed 
the need to build in the concerns of target audiences 
from the beginning to increase the likelihood that 
evaluation findings will be used. Consideration of how 
scarce evaluation resources can be best employed to yield 
reliable information that is useful to the maximum number 
of audiences reinforces the notion that division of 
evaluation responsibilities deserves more careful 
analysis than it has received. 

All grantees receiving federal funds for education 


state and the federal levels. The impression persists 
that grantee application and reporting requirements are 
intended to cover all bases and collect every conceivable 
bit of information, creating such an overload that most 
of the data pour in without being scanned, let alone 
used. For example, in the migrant education program, OE 
required the states to send copies of all subgrants to 
OE. According to the program auditors, this mountain of 
information simply collected dust in a storage area with 
no attempt made to review it (Rock 1980). The practice 
was ended as a result of the program audit. More 
carefully considered requirements would reduce costs and 
response burden and provide fewer and briefer reports 
more likely to be reviewed. 

Requirements that go beyond the basic reporting needed 
for accountability functions should not be levied on all 
localities and states alike. Questions on how a program 
actually operates in the school, questions on the 
detailed nature of the services and variations in 
different localities, and—most difficult of 
all questions on the educational effects traceable to a 
specific program need not be answered by all localities 
or grantees. Cost effectiveness questions dealing with 
the desirability of different program alternatives are 
probably an even less appropriate requirement at the 
local and state levels. Scarce evaluation resources are 
frittered away when demands are made of all that could be 
responded to more effectively by selective sampling in 
nationwide studies or by studies carried out by 
individual local systems or states with proven competency 
and sufficient fiscal and human resources to evaluate 
their own programs. These considerations lend additional 
force to the recommendation made earlier* 


Recommendation C-3. Congress s hould institute a 
diversified strategy ot ev aluation at the state end local 
levels tfaatwould levy mi nim u m Monitoring and compliance 
requirements on all agencies receiving federal funds, but 
allow only , the most competent t o carry out rn . n i.; 
evaluation tasks . x — — * ■- 





The objective of this recommendation is to improve the 
;uality of data needed for accountability without 
.ncreasing the burden of response on local and state 
.gencies. Such data items as distribution of funds, 
lumber and types of beneficiaries being served, and 
pecific program services should be defined by the 
>epartment so that local and state agencies know exactly 
'hat reporting is required of them. Quality control 
>rocedures should be enforced so that performance reports 
:an be made to Congress. Before setting the 
equirements, however, the Department needs to examine 
.ts own capacity to deal with local and state reports so 
is to avoid collecting information that is never used 
lecause of the sheer inability of federal staff to deal 
rith the volume. 

In order to assist agencies in complying with federal 
eporting requirements, the Department should extend 
;echnical assistance as recommended above (Recommendation 
)-8) . One way to provide such assistance would be to 
select local and state agencies doing an exemplary job of 
reporting. If none exists, the Department should fund 
;he development of such examples. Care must be taken to 
select different types of locales exhibiting a variety of 
student, teacher, and resource mixes. The exemplary 
)rocedures should then be actively disseminated through 
jxisting channels, for example, the Department's regional 
>ffices, the Title I TACs, the NDN, or the state agencies. 

A second way to provide technical assistance would be 
:o make funds available to selected exemplary local 
agencies to provide technical assistance on meeting 
reporting requirements to less skilled school systems of 
nomparable type—something like the 

'developer/demonstrators" funded by the NDN (Far West 
jaboratory for Educational Research and Development 1979) 
:o provide training, materials, and technical assistance 
for adopting exemplary education programs. After the 
first 2 or 3 years, such funding should be based on the 
success of an agency designated to provide technical 
assistance in improving the reporting of those receiving 
;he assistance. 



activities within the Department. 

The Office of the Inspector General should continue to 
monitor whether funds are distributed according to law 
and are allocated for the prescribed purposes. When 
questions arise as to whether such additional services as 
the law mandates are being provided to the target 
population (s) (rather than the funds being used for 
regular school operations) , they need to be investigated 
through evaluation strategies and methods appropriate to 
documenting the nature of program interventions. This 
type of evaluation requires research capabilities beyond 
the scope of the Office of the Inspector General. 

Accountability questions on beneficiaries served and 
on program delivery should be monitored by officials who 
administer the programs at the federal level/ namely the 
Assistant Secretaries for Elementary and Secondary 
Education, for Special Education and Rehabilitation 
Services, for Post-Secondary Education, and for 
Vocational and Adult Education and the Director of 
Bilingual Education and Minority Languages Affairs. 
Responsibilities should include the monitoring of program 
coverage and of provision of services mandated by law and 
regulation (including such associated requirements as the 
setting up of parent advisory councils). Where civil 
rights laws are involved, the Assistant Secretary for 
Civil Rights has and should continue to have 
responsibility. Much of the information on program 
coverage and delivery should be obtainable through 
focused grantee reporting using adequate quality control 
and technical assistance measures, as discussed above. 

There is continuing need for a central evaluation unit 
to carry out activities not directly linked to program 
accountability. First, the unit should sponsor, on a 
sample basis and in cooperation with the program unit, 
documentation of program process and detailed 
implementation so as to provide insight on how 
educational services have been changed. Second, also in 

unT, r h at Td T C09ni2ant Pr ° gram ""Its, the central 

pr ° 9ra ” im P rove ”>ent or development 
studies, including needs assessment and understanding of 



proposed changes in law or regulation. Third, when the 
Issue is educational effectiveness, the unit should carry 
3ut—in cooperation with the program offices—needed 
avaluability studies to define objectives and appropriate 
neasures. Only if such measures can be successfully 
established and only if a program is of the type and at a 
stage to allow impact evaluation (see Chapter 2), should 
such a study be undertaken and then only if the need for 
Lt can be justified. 

The reason for assigning shared resonsibility for 
:hese activities is that program administrators 
presumably have in-depth knowledge of their programs and 
m interest in improving educational substance, but they 
nay also have a vested interest in current operations. 

\.t the same time, the central unit is likely to have less 
program expertise but a greater concentration of 
‘valuation talent and social science expertise. When 
such talent and expertise can be found to an adequate 
jxtent in a program office, it may take the lead, with 
;he central evaluation unit as the cooperating office. 

?he central unit should also, from time to time, run 
3hecks on accountability information developed by program 
jffices and the Inspector General and, when necessary, 
conduct its own studies, precisely how all these 
‘valuation responsibilities are shared between the 
central unit and program offices ought to be a function 
>f the expertise residing in each program office. 

Three functions are appropriately shared between the 
central unit and NIE (which is under the Assistant 
Secretary for Educational Research and Improvement). The 
:irst is cost-benefit studies designed to establish the 
ifficiency of alternative ways of obtaining the 
>bjectives of a given program. Such studies require all 
:he expertise needed for assessing program effects and 
:ying them to specific components of the program 
ireatment. In addition, benefits and costs of the 
>rogram must be put in monetary terms, a difficult 
:onceptual problem. Cooperation with NIE is suggested 
lecause of the breadth of skills required and because it 
lay be necessary to conduct basic research in how to do 
lost-benefit studies in education. Each particular 
.nstance of doing such a study will provide material for 
iheoretical research and should be fully informed by it. 
?he two units should also jointly administer the types of 
irant programs suggested in Chapter 3 for local and state 




particular federal program, especially those concerned 
with developing knowledge on more effective educational 
interventions, should be supported or carried out by the 
research arm of the Department, that is, NIE and other 
units within the office of the Assistant Secretary for 
Educational Research and Improvement. 


Coordination 


Decentralization creates the problem of effective use of 
evaluation dollars that are dispersed among three levels 
of government and among many units of the Department of 
Education. A first but not sufficient requirement to 
address this problem is adequate reporting. The lack of 
information on the amount of evaluation dollars spent at 
the state and local levels has already been discussed, 
but even accounting for evaluation dollars within the 
Department becomes a matter of definition, depending on 
particular unit's need or desire to display or hide its 
evaluation activities.^ in Chapter 3 we recommended 
that Congress segregate evaluation funding at the state 
and local levels from program funds and administrative 
costs and require an annual accounting; we repeat those 
recommendations here. 


Ke commendation C -2 . Congress should separate funding £< 
evaluations conducted at th e state and local levels fro. 
program and administrative fundsT ' 

~'~ q ” enflatiQn C ' 4 ~ Conqress 3houl ^ r equire an annual 
report from the Denacfmonf wa '7. . ■ --:--. 
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in its ana cne pxannxng component, wnxcn is currently 
separated from the central evaluation unit. (See the 
liscussion below on the placement of the central 
^valuation unit.) Coordination also should contribute to 
(tore efficient use of evaluation resources. For the four 
phases of evaluation—planning, design of specific 
studies and procurement mechanisms, review, and use of 
findings—there are several ways in which authority and 
control could be distributed, i.e., in which evaluation 
ictivities could be coordinated: 

1. The head of the central evaluation unit or 
cognizant assistant secretary could have both the 
responsibility and the authority (that is, final sign-off 
power) for approving plans, design and procurement, 
findings, and their dissemination. Insofar as possible, 
chis person (office) could also set up incentives for 
ipplication of findings or sanctions against nonuse. 

2. The central unit could have major responsibility 
for coordination of planning, for review of designs and 
quality of procurement (but no sign-off power), and for 
review of findings together with the initiating unit, 
vith dissemination also shared with that program unit. 

3. Besides carrying out its own projects, the central 
jnit could provide technical assistance (when asked) to 
ither units engaged in evaluation activities, but have no 
further authority or responsibility. In this case, 
coordination responsibility or authority would either be 
assigned to some other level (say, the Secretary's or 
Jndersecretary's office) or not assigned at all, as was 
the case for the Education Division within HEW until 
recently. (While HEW's Assistant Secretary for Planning 
and Evaluation received evaluation plans from the whole 
Education Division, generally only those from the central 
unit were reviewed; see Appendix A.) 

The Committee believes that, for each phase of 
avaluation, a different degree of sharing of 
responsibility and authority is appropriate. 

Relationships should also vary depending on the nature of 
the evaluation activity and the degree of expertise 



evaluation dollars, but also the failure to use 
evaluation findings and the inability to cumulate 
knowledge about programs. The cost of any degree of 
coordination is time—more staff time for communication 
and more executive time for making decisions. Therefore, 
no matter what coordinative mechanisms are adopted, the 
Committee suggests that both the time invested and the 
results be tracked with some care, so that the effort to 
use evaluation resources wisely does not end up leading 
to negative results. For example, staff may get so 
occupied with meetings, with defenses against criticisms, 
and with waiting for decisions that they have inadequate 
time to produce procurement requests of high quality, to 
effectively monitor evaluation studies, to respond to 
modification requests from contractors or grantees, to 
review reports in detail, or to disseminate findings. 
Tracking of how well coordination procedures work should 
lead to their reexamination periodically, perhaps every 3 
or 5 years. The rest of this section presents our 
suggestions for the Department with regard to 
coordination at each stage of the evaluation process. 


Planning 

We believe planning should be centralized, with all 
units—program, policy and planning, budget, research, 
etc. involved at the staff level and with sign-offs 
required by each assistant secretary. The assistant 
secretary responsible for evaluation should take the lead 
for the coordination of planning. The central unit 
should carry responsibility for developing, together with 
the cognizant program units, a coordinated plan, 
including series of related studies, for each of the 
large federal education programs, as exemplified by those 
for Title I and P.L. 94-142. The central unit also 
should be charged with the coordination of all evaluation 
planning, even though the planning and execution of 
specific studies may be carried out elsewhere—a program 
office, the research unit, or even the local or state 
level. 



Incremental program information of the kind needed by 
lecision makers both in Congress and within the 
Department. Improved evaluation planning will clarify 
lata and information needs for evaluation and allow the 
Secretary to assign priorities to them in the context of 
ether data gathering needs. Recommendation D-iO, which 
speaks to this issue, is repeated here: 

Recommendation D-10. The Department of Education should 
Institute a flexible planning system for evaluations of 


Eederal education programs . 

In Chapter 3 we emphasize that planning for evaluation 
cannot be a totally internal activity. Outside groups 
laving a stake in a program must be consulted, since the 
Department's top priority external audience is Congress, 
the Department needs to develop better liaison regarding 
evaluation activities with members and with congressional 
staff. Congressional aides have been very critical about 
the relevance, timeliness, and packaging of evaluation 
reports (see Zweig 1979). More involvement of 
congressional staff is needed in selecting basic issues 
and questions that can be answered by the evaluation 
process. The central evaluation unit, being more removed 
than program administrators from the politics surrounding 
particular education programs, should be charged with the 
responsibility of communicating with Congress about 
evaluation needs (see Recommendation C-l). Program 
units, on the other hand, tend to be closer to such 
constituency groups as representatives of target 
populations and educators charged with carrying out the 
programs; therefore, they should be responsible for 
obtaining their participation in the planning for 
individual studies as well as in the development of the 
overall plan. 



documents. Final veto or sign-off power, however, should 
not reside with these committees but with the cognizant 
assistant secretary supervising the unit that prepared 
the design or the procurement instruments or grant 
guidelines. If technical or substantive criticisms are 
made by the reviewing committee, the cognizant assistant 
secretary should require responses from the originating 
unit that either refute the criticisms or indicate 
changes made as a result. If the central unit is the 
sponsor of the study, the process should be reversed, 
with the relevant program unit providing review. The 
central unit should also have staff available to provide 
technical assistance during the execution of a study, 
that is, when staff from other units monitoring an 
evaluation contract might call for assistance in 
reviewing progress or authorizing changes in study 
direction, design, test instruments, analytical 
strategies, and the like. 


Review of Findings 

The process for review of findings, either at an interim 
stage or in final reports, should be similar to that 
suggested for the design and procurement of studies. 
Technical committees drawn from the staff of the central 
evaluation unit and the Assistant Secretary for 
Educational Research and Improvement (possibly the same 
ones involved in the design and procurement phase) should 
review reports and associated materials. Comments should 
be forwarded to the originating unit, with a requirement 
for rebuttal or incorporation of changes responsive to 
the technical review. Program units should be afforded 
the same review opportunity for studies originating in 
the centrs! unit. These internal reviews of designs and 
of findings should be preliminary to the external reviews 
suggested for each of these phases in Chapter 3. 



proposal and subsequent contract or grant. The 
>riginating unit's dissemination plan would be reviewed 
ilong with other features in the design and procurement 
phase. The originating unit should have the 
responsibility for carrying out the dissemination plan 
addressed to the primary audiences, who presumably are 
closely tied to the originating unit. The central unit 
nay carry out dissemination to secondary audiences as it 
ieems appropriate. 

The central unit should also serve as the storehouse 
and coordinating center for information derived from all 
evaluation activities, including not only studies 
originating in the Department, but also those carried out 
oy state and local agencies and even work relevant to 
education that may not have been federally funded or be 
concerned with federal education programs. The unit 
should be responsible for cumulating knowledge from these 
sources, reanalyzing data, and refocusing information 
oecessary to suggest changes in legislation, in 
regulation, in program management, or in program 
intervention as evidence indicates. Other units, 
particularly the Department's research arm, should 
cooperate in this integrative function. 

Functioning as something like a nerve center for 
evaluation information, the central unit should also be 
charged with getting relevant information to audiences 
that can act on it or are likely to have an interest in 
it, beyond the audiences already included in the 
dissemination plans for a specific study, as noted in the 
following recommendations from Chapter 4: 

Recommendation D-13. The Department of Education should 
ensure that dissemination of evaluation results achieve! 


adequate coverage . 

Recommendation D-14. The Department of Education should 
observe the rights of any parties at interest and the 
public in general to information qenerated about public 
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could also devote time and energy to the communications 
problem. Too many evaluation reports are cloaked in 
jargon that is unintelligible to decision makers and 
other nontechnical audiences. Although most evaluation 
contracts now specify that an executive summary must 
accompany the final report/ insufficient attention to 
effective packaging of evaluation findings continues to 
be the rule. Too many reports are not read or not 
understood by busy policy makers or by outside groups 
that could use the information because the language of 
the reports is unclear. There is a real difference 
between ambiguity of findings, which can be expected fox 
large, complex programs that encourage local variability 
and the inability to present those findings in 
understandable prose. Personnel in the central unit 
charged with responsibilities for disseminating 
evaluation findings must perform the translation from 
scientific jargon to clear English when such translatior 
has not been adequately done by contractors or grantees, 
In order to be effective in this role, however, central 
unit dissemination staff must possess requisite 
communication skills and must be insulated from politicc 
pressures that otherwise will quickly undermine the 
credibility of their work. 


Location of the Central Evaluation Unit 

We have proposed that the central evaluation unit be 
charged with important coordinating responsibilities in 
developing the Department's overall evaluation plan and 
in synthesizing and disseminating evaluation-related 
knowledge derived from all sources. We do not foresee 
that these responsibilities can be adequately carried o> 
as long as the central evaluation unit is subsumed with: 
the management arm of the Department. The implicit 
message of this arrangement is that only the management 
perspective of evaluation is considered a high priority 
While some members of the Committee favor the 
assignment of an assistant secretary to the evaluation 



mit is probably the only one that could provide the 
Secretary with a comprehensive view of the amount of 
noney being spent for evaluation, of the types of 
^valuations under way, of the effectiveness of the 
various disparate parts of the evaluation "system," and 
^f the potential for using study findings to make more 
Informed decisions about programs. 

A variety of administrative mechanisms can be used to 
improve the current situation. For example, the 
Department could make the unit a separate office 
immediately responsible to the Secretary or the 
Undersecretary to provide the needed access and 
credibility. A precedent exists in the case of the 
Office of Bilingual Education and Minority Languages 
Affairs. Another possibility for making the unit more 
effective is to couple it more closely to the major 
planning function. We would caution, however, that some 
separation should be maintained between evaluation and 
budgeting. Though these functions are often located 
together, subservience of evaluation to the budgetary 
process is as counterproductive as using evaluation to 
chastise or reward individual program managers, 
apparently the Department's current direction. If 
budgetary decisions and the handing out of rewards or 
sanctions are to be the main functions of evaluation 
activities, they will be devalued as a means for program 
improvement. As long as evaluation is seen as a 
threatening rather than as a supportive activity, those 
who are subject to the threat will find ways of defusing 
it by covert lack of cooperation or outright opposition. 
As a result, evaluation activities will continue to be 
curtailed, and results consigned to the dusty shelves of 
unused reports. 


CONSTRAINTS 

No matter how evaluation responsibilities are assigned 
and organized, the Department has to face some important 
constraints that are only partly under its control: 
constraints of budget, of staff, and of process. 





through specific set-asides for evaluation (for example/ 
the half-percent of program funds mandated for national 
evaluation of Title I). However, as a consequence of the 
dispersion of evaluation responsibilities, the central 
unit spends less than half the money invested in 
evaluation at the national level: $19.6 million of the 
$43.4 million estimated for the whole Department 
(including the inspector general) in 1980. (For an 
estimate of evaluation spending by various components of 
the Department, see Appendix A.) As already noted, 
additional federal funds are spent at the state and local 
levels for evaluations, with respect to accountability 
of spending for evaluation, then, there is trifurcation 
of responsibilities: the central evaluation unit, 
program units of the Department, and states and 
localities. But only the central unit has been the 
object of major scrutiny and a decreasing budget, while 
responsibilities and funds are idiosyncratically assigned 
by legislation or executive practice to selected federal 
program offices and to state and local authorities, often 
without similar scrutiny of performance. 

In the last 3 years, the Department has not been 
successful in convincing the appropriations committees of 
Congress that an increased budget for the central 
evaluation unit was warranted, even while authorization 
committees have asked for more evaluation. In fact, 
unds have been appropriated for evaluation activities 
outside the central unit, and Congress has spent 
a ltional funds on its specially commissioned studies . 
These actions appear to reflect an inability to make a 
convincing case for the work of the central unit, 
although it is not clear whether the apparent 

i eading to ^creasing budgets has been 
. n 6 . ??? dequ ate performance or has been due to 
greater visibility and scrutiny. 


otarr constraints 

We have commented previously that the complexity of arw 
evaluatron process beyond tracing money and counting 



jrest group& wuu tepteaenu jjiuycaiu Denericiaries ana 
vice providers. Having practical program knowledge 
experience is helpful as well, though this can be 
lied through cooperation of the relevant program 
ts. 

The staffs of evaluation offices have to be able to 
lain issues involved, to develop questions to be 
wered, to suggest methodologies for research, and to 
pare statements of work for RFPs and other procurement 
uments. They have to participate in panels that 
ablish criteria and make recommendations for the 
ection of winning contractors. They are also likely 
negotiate substantive contract issues before awards 
made. After a contract is awarded, the cognizant 
ff person or project monitor must be able to provide 
ihnical assistance if needed by the contractor, assist 
clearing survey instruments, and rule on modifications 
[uested by the contractor. In order to respond 
ectively to contractor requests, the staff person 
sds to understand through first-hand research 
>erience whether requested changes are appropriate or 
Throughout the course of a project, staff members 
it provide professional review, including careful 
unination of final reports. 

The unusual array of skills, experience, and diverse 
rspectives needed to manage evaluation programs is not 
iily obtainable. The Department is limited in its 
Llity to recruit top-quality staff in adequate numbers 
;ause of personnel ceilings and other civil service 
istraints. The Committee has not had time or 
sortunity to assess the qualifications of the staff in 
2 central evaluation unit, though there are obvious 
ps in disciplinary expertise, in the representation of 
norities (see Chapter 3), and in hands-on experience 
;h field-based applied research studies of the kind 
ng designed and monitored by the unit. What seems 
sar, however, is that the current deployment of staff 
d assignment of responsibilities does not take 
vantage of the collective expertise in the central unit 
in the research components located elsewhere in the 
partment (for example, in NIE or the National Institute 



technical detail is to leave little room for creativity. 
Nor is it likely that the expertise represented by the 
central unit is duplicated in every program office with 
evaluation responsibilities. In some cases, evaluation 
work carried out elsewhere in the Department may open up 
innovative ways of planning and designing studies, as haJ 
been true for the NIE compensatory education study and 
the evaluation plan for P.L. 94-142. Both these 
instances come from units with research expertise. Othe 
program offices, however, are unlikely to be able to 
staff up for the evaluation responsibilities now assigne 
them or that they might acquire in the future. 

Recommendation D-17. The Department of Education should 
examine staff deployment and should establish training 
opportunities for federal staff responsible for 
evaluation activities or for implementation of evaluatio 
findings . 

The Department should consider alternative ways of 
using the technical staff within the central unit and 
evaluation staff in other units. Duties and 
responsibilities would vary according to the amount of 
government control exercised by staff: grants and 
consultancies entail the least control, contracts and 
evaluation teams configured of government staff and 
outside experts more, and in-house studies the most. 
Figure 5, adapted from one originally prepared by Wargo 
(1980) , illustrates the three major relationships betwee 
government staff and outside experts and some of the 
characteristics of each alternative. The Department has 
iargeiy used the contracting method, though in-house 
analysis has been characteristic of selected areas, 
particularly for postsecondary programs. There may be 
evaluation work that is better addressed by the 
grant/consultantship method (see Chapter 3) or by an 
evaluation team. In part, the choice depends on the typ 
of evaluation work to be undertaken, but staff capabilit 
is an equally important criterion. The greater the 
degree of government involvement, the greater the skills 




must be an adequate number of staff/ and they must have 
the requisite training and experience. Moreover, a work 
atmosphere conducive to attracting good staff and holdinc 
them must be provided. The Department should examine th< 
number and types of positions assigned to evaluation 
activities in light of responsibilities and work load 
(number of RFPs to be prepared, contracts monitored, 
final reports to be analyzed, etc.) within the central 
evaluation unit and wherever else evaluation activities 
are carried out. It should also examine the extraneous 
and counterproductive demands that are imposed on staff 
through internal procedures that could be simplified. 
Consideration of personnel needs should also take into 
account the time required for the type of training 
suggested below. 

The academic and experience background of personnel 
charged with evaluation responsibilities should be 
examined in connection with the tasks they are required 
to perform. This applies to staff in program units as 
well as to staff in the central evaluation unit. If 
necessary, training programs should be conducted to 
prepare staff members for the writing of work statements 
to familiarize them with new evaluation techniques, and 
to strengthen their knowledge of selected social science 
disciplines. Handbooks should be prepared for persons 
who monitor the substantive aspects of evaluation 
contracts. If federal personnel lack field experience, 
determined effort should be made to expose them to 
practical situations affecting the evaluation process . 
Short-term field assignments could be used to provide 
national office personnel with needed practical 
experience. 

At the same time, as noted in Recommendation D-4, 
program executives and staff as well as other line 
executives outside the units specifically concerned with 
evaluation would benefit from greater knowledge of the 
language of evaluation and how evaluations oan be used. 
Program managers at the federal level play a variety of 
important roles in the evaluation of education programs. 
Program managers often suggest which of the national 



rogram delivery and program effects that must be 
ranslated into the evaluation questions to be asked, 
rogram managers need to provide key questions to the 
/aluation experts, spell out what they consider to be 
idicators of successful performance, and so on. During 
le course of a study, managers often assume the role of 
3-monitor and may accompany the technical evaluation 
3am into the field to assess progress. At the end of an 
valuation, managers play an important role in the 
iterpretation of the results. All of these roles would 
i significantly improved if managers had a better 
iderstanding of the basic principles of evaluation, 
raining for federal staff on relevant topics should be 
istituted. Seminars in evaluation methodology and in 
^plications of social science research to program 
nprovement could be given by technical staff from the 
intral evaluation unit and the Department's research arm 
d by external evaluation experts. A newly created 
raining unit within the Department, the Horace Mann 
istitute, provides an appropriate internal vehicle. 

:her alternatives include specially tailored offerings 
f the Federal Executive Institute and the Graduate 
;hool of the Department of Agriculture (which is 
iheduled for transfer to the Department of Education). 

1 addition to providing some technical knowledge, 
raining should increase the understanding of program 
inagers about what kind of information evaluation can 
id cannot provide. 


Process Constraints 

1 a number of ways, the Department's own procedures 
ihibit its ability to produce timely and relevant 
valuation studies of high quality. These procedures 
ffect each stage of the process: producing a coherent 
at of plans for the whole Department, designing 
ndividual studies, procurement, launching the study once 
contract or grant has been awarded, monitoring its 
rogress, and disseminating its findings. A typical time 
lart for a relatively straightforward study that is 
ntended to take 12 working months for design, data 
jllection, and analysis is pictured in Figure 6: under 
urrent conditions, a lead time of 3 years is necessary. 



FIGURE 6 Time chart—.typical evaluaton study 








delay approval of studies 


as illustrated by the 1980 


procurement schedule (see Chapter 3, Table 1). Delays in 
the planning process may create postponement of studies 
into a new fiscal year. An even more adverse effect 
(also noted in Chapter 3) , has been the unwarranted 
compression of time for the most difficult intellectual 
work: design of a study by federal staff and by 

responding proposers. The planning process is under the 
control of the Department; presumably, as planning 
mechanisms become better established, time delays can be 
reduced. 


Procurement 

The procurement process or any alternative mechanism for 
getting the work done entails negotiations within the 
Department between the unit designing the evaluation and 
the relevant program unit (if the study is not conducted 
there) as well as other parties at interest, for example, 
the Office of Civil Rights, the offices of the 
Undersecretary or the Secretary, the Assistant Secretary 
for Planning and Budget, or the National Institute of 
Education. In selected cases—for example, in Title I 
evaluations in which the legally constituted advisory 
council participates—outside groups are also involved. 
(We note that our recommendations in Chapter 3 with 
respect to opening up the procurement process in order to 
enhance the quality of evaluations will further 
complicate the process and may introduce additional time 
lags.) A major party to such negotiations is the Grant 
and Procurement Management Division, which must approve 
all procurement instruments or grants announcements. The 
federal competitive procurement processes as interpreted 
and enforced by this division take, on the average, 6 
months from review of the statement of work prepared by 
the initiating office to the time of award, exclusive of 
response time allowed between announcement of RFP or 
grant guidelines and the proposal due date. 
Noncompetitive processes, such as sole-source awards or 
unsolicited proposals, can be completed in shorter time. 


the costs of evaluations are increased by the 
considerable—though hidden—costs of the process 
(preparing RFPs, writing lengthy proposals) that are 
built into internal staff salaries and the total costs of 
the resulting contracts. At the same time, the losses 
that result from the process are considerable: 
limitations on creativity and quality, time delays, and 
wasteful use of human resources inside and outside 
government. Though the way the government obtains 
research services is generally regulated by statutes that 
pose external constraints, any federal agency has 
considerable latitude in its interpretation of applicable 
regulations. Differences in operating procedures are 
readily discernible to individuals familiar with several 
agencies. The Department of Education would profit by 
examining the more flexible strategies of other agencies. 


Launching a Study 

For any study that involves collecting the same 
information from nine or more respondents, OMB clearance 
(which may be delegated) must be obtained. When this 
requirement was first instituted by OMB, there were three 
reasons: to assure adherence to statistical standards, 
to allow OMB to judge the economic impact of a proposed 
study, and—most importantly in recent years—to reduce 
the burden on respondents imposed by the multiplying 
demands for data. Reduction of the response burden 
remains a major objective for both the administration and 
Congress. As more and more data collection efforts in 
education became subject to clearance (e.g., program 
report forms, statistics gathered by NCES, all evaluation 
and research studies resulting in information to be 
delivered to the government), the Education Division 
within HEW set up its own internal screening mechanism, 
the Educational Data Acquisition Council (EDAC), to 
facilitate OMB clearance. In parallel, the chief state 
school officers, concerned with the time and money 
consumed by responding to federal data requests, also 






with CEIS as an official participant. As noted in 
Chapter 3, the 1978 amendments also introduced the 
requirement for notification and availability by February 
15 of data collection instruments to be used in the 
following school year. The effects of the clearance 
provisions are illustrated by the following examples. 

A contract for a study on sex equity in vocational 
education, mandated by Congress, was awarded in late July 
1977.6 By early December, with concentrated efforts by 
the contractor and the federal project officer, the forms 
clearance package was sent to the OE clearance officer 
who had the job of reviewing submissions to EDAC. The 
clearance officer sent the package forward 2 months 
later, in early February 1978. EDAC clearance was 
obtained on March 1, and the package was then forwarded 
to the Assistant Secretary of Education whose clearance 
was needed before submission to OMB. This clearance was 
obtained on March 22, and OMB clearance, the final 
hurdle, received on April 14. Because the study had high 
visibility and because there were relatively few 
instruments involved, clearance took 4-1/2 months, close 
to the minimum time averaged during that period. There 
were, however, important changes in instrumentation: a 
major questionnaire dealing with attitudes was eliminated 
at the stage of OMB clearance (as were most such items in 
other types of instruments). The ostensible reason for 
the deletion was that the legislation did not require 
collection of that type of information. In this way, a 
review of 3 weeks overrode the work of 4 months—which 
included extensive consultation with parties at 
interest—by the contractor and the project monitor. 

Another example concerns a planned study of Indian 
education scheduled for completion in order to feed into 
the reauthorization process for the legislation, due to 
expire in 1983; hence, the study results should be 
available for hearings likely to be held in 1982. 

Approval for the study was not received from within the 
Department until May 1980; an award was made on September 
30, 1980. Even more than for the sex equity study, the 
choice and design of instrumentation will have to include 
careful consideration of the sometimes conflicting 


data collection, and for getting the whole package 
approved through the clearance mechanisms. If the 
February 15 deadline cannot be met, either a waiver will 
have to be obtained or the study postponed for a whole 
year. Not only will postponement add considerably to its 
cost, but it will make the study irrelevant to the 
purpose for which it is being undertaken, since data 
collection could not even begin before fall of the year 
(1982) in which the congressional hearings are to be hel< 
In another case, a recent 12-month study of OE 
evaluation projects, clearance procedures had not been 
completed by the time the study was done and the contrac 
had ended. The choice was to delete the data collection 
aspect of the study or to proceed in the absence of 
required clearance. The first would have led to a year 
or more delay in the study, the second to illegal 
procedures. 

Carter (1977) describes two other examples. For the 
sustaining effects study of Title I, a very complex stud 
using 10 different types of measures, clearance of the 
first 2 of the 10 sets of measures took 8 months. The 
clearance packages for all 10 sets of instruments 
totalled 1,412 pages; Carter's estimate of the cost for 
the clearance process (not including development of the 
instruments) was $155,500 (in 1976 dollars). The second 
example involved a congressionally mandated study of 
Title I services for neglected or delinquent children; 
clearance took 6 months. Carter notes (1977:11): 

Almost without exception the reviewers, either at 
OE or OMB, had never been to an institution for the 
neglected or delinquent. Many of them were not 
aware of the results of our clinical pretests, yet 
they felt they knew how and in what form the 
material should be collected. Again, 
office-generated expertise superceded actual field 
experience. 

The last example that we cite provides an interesting 
illustration of how the drive toward reducing 



n participants in federal education programs. 

Collection of such data is rare except in those cases 
were specific populations are targeted, for example in 
ESAA (desegregation assistance) and bilingual programs, 
for which information on the targeted group is 
ollected. Other exceptions are research studies not 
directly coupled to specific program evaluations, such as 
the National Assessment of Educational Progress (1978). 
£et given the overall mission of the federal education 
programs to increase equal educational opportunity, it is 
somewhat surprising that programs as a whole are not 
valuated with respect to their effectiveness in 
improving education for ethnic or racial minorities and 
females. Recently, regulations have been changed to make 
possible the gathering of data on race, ethnicity, and 
gender in grantee applications for funds, but the 
gathering of such data for program assessment has always 
been possible. That it is still largely absent can in 
ood part be ascribed to budgetary and clearance 
constraints, which drive any evaluation study toward 
collecting only those data for which there is an express 
"need-to-know." And "need-to-know" is often equated with 
specific mention of a subgroup in legislation for the 
program or its evaluation. 7 One can only conclude that 
current clearance procedures, whatever other purpose they 
Tiay serve, have had the effect of minimizing the ability 
io obtain information crucial to meeting federal goals in 
sducation. in part, that effect may have been the result 
if considering each study in isolation as it went through 
the clearance process and attempting to minimize response 
lurden case by case. We note that the process is in the 
nidst of change. 

At this time, the intent (expressed both through 
axecutive action by OMB and through proposed legislation 
Ln Congress) is to manage the reduction of response 
lurden more like fiscal budget allocations: each agency 
submits to OMB an information collection budget that 
requests an allocation of the total number of burden 
lours necessary to carry out its management, evaluation, 
and research responsibilities. On the basis of the 
submission, an allocation will be made by OMB, probably 
fith a 10-15 percent cut in response burden, a goal 
announced for 1981. (Another cut is to be made the 


compliance, tnat is, iiautmauiv.* --- - ‘ 

program applicants and grantees, information needed for 
fiscal audits, and information needed to enforce 
compliance with civil rights laws. OMB will delegate the 
responsibility for clearance of specific studies and 
instruments to the agency's internal mechanism when it is 
deemed to be functioning well or the law so specifies, as 
is the case with FEDAC.® 

The evolution of clearance procedures from reviewing 
individual studies to a process that assembles all 
proposed data collection in one document should allow 
top-level Department officials to consider the data needs 
of evaluation and research in a forum where they are 
presented together with those of program administration, 
enforcement (for example, the data needs of the Office of 
Civil Rights), auditing, and the periodic gathering of 
general statistical data and indicators (for example, the 
data collected by NCES) . It may also encourage the 
coordination of studies across organizational units so 
that studies proposed by one unit can use data collected 
elsewhere. The Department should be alert to the 
opportunities for more coherent evaluation and data 
collection activities offered by the new clearance 
process. 


Progress 

After clearance, time delays in the progress of a study 
will be occasioned by the inevitable discrepancies 
between assumptions in the study design and actual 
conditions in the field. The nature of the program 
activity, the individuals engaged in it, the willingness 
of respondents to cooperate, the presence of 
documentation all will present unforeseen difficulties, 
particularly if the timing of the study is thrown off 
schedule by the clearance process, other delays may be 
introduced by the researchers themselves, who are wary of 
potential criticism and therefore employ time-consuming 
procedures to assure technical impeccability that does 
not enhance the quality of the study (e.g., by meticulous 


fhe inability of federal monitors to respond in timely 
fashion to simple, much less to complex, requests for 
hanges in the study plan, either because of their 
workload or because they do not have authority on their 
own to rule on the requested change. Hence, de y 
becomes no one’s responsibility. 


Dissemination 


Within HEW in recent years, dissemination of study 

Lings has been held up in Secretary’s off icefor 

, any montns because of t0 inquiries from 

Secretary informed and ab examp le, for the study on 

the media and the public. (referred to above) 

sex equity in vocational educaltio ( a year afte r 

the findings were not released u Y The delay 

the final report was ' cb nt ro versial subject of 

appeared to be occasioned by themse ives, since 

the study rather than by x repor t (Harrison and 

no changes were made m the fin 

Dahl 1979) . . u Education brought 

The advent of the new Depa lease Q f findings 

about new rules*, a directive rovi des 10 days, 

(U.S. Department of Education 1980a) pr ^ 
after acceptance of the stu y progra m and other 

evaluation unit, for response J ter the 10-day 

offices. Reports are to be received. However, 

period, accompanied by t ec occasioned by 

this rule does not dea * wl ^^ t i ng office and the 
disagreements between the P sponsoring 

performers or with Lit. For example, one 

offices other than the ® va ^ 1980 (David 1980), 

study report submitted 1" ^ between the sponsoring 
whose findings were c „ for program Evaluatio 

office (the Assistant Secret y ^ evaluat ion unit for 
the former HEW) and the the not been released 10 

the Office of Education, .^articular has been concerned 
months later. Congress 1 P suspicion has ans 

with such delays: on occasion. 




policy concerns regarding a specific program will change 
making the findings of the evaluation, when they do 
become available, of little interest. 


Recommendation 05. Congress should authoriz e a study 
group to analyze the combined effects of the le gislative 
provisions and executive regulations that control 
federally funded applied research . 

Congress has been dissatisfied with the lack of 
relevance and timeliness of much evaluation work in 
education. One of the causes for delay and for 
irrelevance is the accumulation of rules and regulations 
governing the relationships between sponsor, researcher, 
and action site or agency, i.e., the Department of 
Education, the contractor, and the state/school/student. 
The whole process of funding and carrying out applied 
research about social services is severely constrained by 
these rules and by the operating precedents they have 
engendered. Almost every provision now on the books or 
enforced through executive practice may be justified when 
considered in isolation: to prevent favoritism in 
contract awards, to protect respondents from a heavy 
burden of requests for data, to protect the privacy of 
individuals, to require disclosure of information related 
to the public business, and so forth. Their combined 
effect, however, has been to lengthen the time needed for 
compliance, to increase the costs both within government 
(through greater investment of staff time) and of 
extramural contracts and grants, and to discourage whole 
classes of potential performers from participating. 

Though laws sometimes specify time limits for procedures 
(e.g., for OMB clearance of data collection instruments), 
they are seldom observed in practice. 

To date, most of the concern has been with instituting 
procedures to guard against possible transgressions in 
initiating and carrying out applied social science 
research. The trade-offs between the benefits of such 
safeguards and the obstacles they create to producing 



oblems within a single agency or department and examine 
le process as it works in several different agencies. 


;conunendation D-18. The Department of Education should 
ike steps to simplify procedures for procuring 
r aluation studies/ carrying them out/ and disseminating 
teir findings . 

The Committee has recommended (see Chapter 3) that the 
sans by which the Department solicits, selects, and 
ands evaluation studies be expanded in order to allow 
)re performers to participate. The competitive 
rocurement process involving issuance of an RFP and 
warding of a contract to the highest-ranked or 
uwest-priced bidder is by far the most commonly used 
nrm of solicitation. This type of solicitation was 
esigned by the government for the purchase of highly 
pecifiable goods or services so that contracts could be 
warded on the basis of the best buy for the dollar. The 
jles that have accumulated over the years to ensure fair 
ompetition have shifted considerable control of the 
rocess from the technical specialists (for example, in 
he evaluation unit or in a research office) to the 
ontracting office, the interpreters and enforcers of the 
overnment procurement regulations. This has had serious 
replications for the quality of evaluations (discussed in 
hapter 3) and has increased the time needed for arriving 
t compromises acceptable to all. The process has become 
ot only restrictive and inflexible but very costly in 
nternal staff time and for potential contractors. And 
ince the cost to contractors is recouped eventually from 
he government through overhead and in other ways, the 
overnment bears the double burden. 

Recent criticisms (U.S. General Accounting Office 
980a, Gup and Neumann 1980) have focused on abuses 
tossible in the use of consultants and sole-source 
:ontracting. The Committee is not convinced that the 
:ost of rules instituted to prevent such abuses is not 
ligher than the cost of the abuses themselves. The 
-arious means (other than competitive procurement through 



Department must be more deliberative in choosing whether 
to use competitive procurement, sole-source contracting, 
8-A contracting, cooperative agreements, basic ordering 
agreements, or grant awards, within the limitations of 
the law (see P.L. 95-224). 

The major sources of delay, once a contract or grant 
for a study has been awarded, must also be identified and 
addressed. This applies particularly to clearance 
procedures and to the in-house handling of requests for 
changes in study design, sampling procedures, testing, 
analysis, time frames, and the like. While a request for 
a modification is being considered, the evaluation may be 
in a hold status, pending the sponsor's response. In 
such cases, the sponsor's nonresponsiveness can 
contribute materially to delays in project completion, 
with the effect of cost overruns. 

At times, failure to perform on time is the 
responsibility of the contractor or grantee. The 
Department should institute and enforce sanctions and 
incentives to encourage timely performance. For example, 
some agencies have included clauses in contracts that 
provide that nontimely performance (products not 
delivered by the specified date) can be a basis for 
nonpayment of up to one-third of the contractor's fee. 

Most contracted evaluations have provisions for review 
of delivered products by the project officer, which often 
may entail extensive internal review and clearance. To 
the extent that these reviews are not completed in an 
efficient and timely manner, the projects are subjected 
to time delays. Such delays may be as injurious as 
budget overruns, leading to delays in dissemination of 
findings and charges of lack of timeliness. Because of 
the possible cost of such delays. Recommendation D-13 
(see Chapter 4) seeks to limit the period of control over 
evaluation results. The Committee is not advising 
against review: quite the contrary. It is advocating 
that the time taken for internal review be shortened in 
favor of making findings freely available to stand the 
test of the marketplace. In the long run, this will both 
increase the quality and improve the chances of 
appropriate use of evaluation results. 


1 There are exceptions. Political appointees given the 
job of reducing the budget will have reasons to find 
reduced needs. 

2 At present, the Office of Civil Rights (OCR) is 
funding a study to review testing and evaluation 
instruments used with handicapped persons and another 
study to identify the factors that cause 
overrepresentation of minority children in programs 
for the mentally retarded. OCR has also funded 
cost-benefit analyses of programs mandated under civil 
rights legislation (O'Neill 1976). 

3 For fiscal 1980, the budget for the Department of 
Education was $14.2 billion. For ESEA Title I, the 
1980 budget provided $3.2 billion; for Education for 
the Handicapped, $1.05 billion, and for Rehabilitation 
Services and Handicapped Research $932 million; for 
vocational education, $928 million; for impact aid, 
$825 million; for emergency school aid, $249 million; 
and for bilingual education, $167 million. 

4 As an example, when the National Institute of 
Education was under an edict from its governing body, 
the National Council for Educational Research, to 
increase the percentage of funds spent for basic 
research, it shifted its labeling of certain 
activities from "evaluation" to "research." Since the 
boundaries are often fuzzy, this kind of redefinition 
is not infrequent. As a counterexample, nearly $1 
million allocated to the evaluation of Title VII 
(bilingual education) were reprogrammed in fiscal 1980 
by the former Assistant Secretary for Education in HEW 
to support further development of "Villa Alegre" (the 
bilingual analog to "Sesame Street"), a decrease of 
more than one-third in the actual evaluation budget, 
though reporting figures stayed unchanged (see 
Appendix A). 

5 "Best" has different connotations in different 
instances: it may mean the lowest-priced proposal of 
those technically acceptable; it may mean the 
lowest-priced proposal of those exhibiting high 
degrees of excellence; or it may mean some combination 
of these and other criteria spelled out in the RFP. 

6 The information on this study was provided by Robert 
Maroney and Dorothy Shuler of the Office of Program 
Evaluation, the central evaluation unit. Their help 
in tracing the clearance procedures and other process 
steps is gratefully acknowledged. 



data are deemed to be irrelevant or dangerous, they 
raise costs by requiring larger samples, and they are 
the concern of enforcement rather than of evaluation 
staff. 

8 PEDAC has a permanent staff of four professionals, 
augmented by three to four professionals on detail 
from other units or from outside the Department. From 
time to time, however, FEDAC staff are themselves 
detailed for considerable periods of time to other 
duties. Staff shortage has been a major cause of 
delays in obtaining clearance. 



Glossary 


A ERA 
AFT 
AIR 
ARROE 

ASPE 

ASPIRA 

BEH 

BOAE 

CCSO 

CEIS 

CENTRAL 

(EVALUATION) 

UNIT 


American Educational Research Association 

American Federation of Teachers 

American Institutes for Research 

American Registry of Research and Related 
Organizations in Education 

Assistant Secretary for Planning and 
Evaluation 

An educational research group oriented 
toward Puerto Rican interests 

Bureau of Education for the Handicapped 
(OE), now Division for Special Education 
and Rehabilitation Services 

Bureau of Occupational and Adult Education 
(OE), now Division of Vocational and Adult 
Education 

Council of Chief State School Officers 

Committee on Evaluation and Information 
Systems 

formerly the Office of Program Planning, 
Budgeting, and Evaluation (OPPBE/OE), later 
the Office of Evaluation and Dissemination 
(OED/OE), now the Office of Program 
Evaluation (OPE/ED) 
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wationax uoaxition or nispanic mentax 
Health and Human Services Organizations 


CRS 

Congressional Research Service 

DISTAR 

A reading program for primary grades 

DOL 

U.S. Department of Labor 

ED 

U.S. Department of Education 

EDAC 

Educational Data Acquisition Council 

EDUCOM Inc 

A private corporation performing 
educational research and development 

ERIC 

Educational Resources Information Center 

ESAA 

Emergency School Assistance Act 

ESAA-TV 

A series of television programs aimed at 
minority group children of school age 

ESEA 

Elementary and Secondary Education Act 

FEDAC 

Federal Educational Data Acquisition Council 

FNS 

Food and Nutrition Service of the U.S. 
Department of Agriculture 

FY 

Fiscal Year 

GAO 

U.S. General Accounting Office 

GPMD 

Grant and Procurement Management Division 
(OE) 

HEW 

U.S. Department of Health, Education, and 
Welfare 

HHS 

U.S. Department of Health and Human Services 

IDEA 

Institute for Development of Educational 
Activities 


Inspector General 


Intermediate Service Agency (set up by SEAs 
and LEAs to provide services to LEAs) 

Independent School District 

Joint Dissemination Review Panel (OE-NIE) 

Local Education Agency 

Manpower Demonstration Research Corporation 

National Association for the Advancement of 
Colored People 

National Center for Education Statistics 
National Education Association 
National Institute of Education 
National Institutes of Health 
National Science Foundation 
Office of Education 

Office of Evaluation and Dissemination 
(central evaluation unit in OE) 

Office of Management and Budget 

Office of Program Evaluation (current 
designation of central evaluation unit in 
Division of Management, ED) 

I 

Office of Program Planning, Budgeting, and 
Evaluation (former title of central 
evaluation unit in OE) 

Parent Advisory Committee (Title I, ESEA) 

Public Law (for example, P.L. 94-142, 
Education for All Handicapped Children Act 
of 1975) 


RDD&E 

Research, Development, Dissemination, 
Evaluation 

and 

RDU 

Research and Development utilization 
Program (NIE) 


RFP 

Request for Proposal 


RFQ 

Request for Qualifications 


SBA 

Small Business Administration 


SDC 

Systems Development Corporation 


SEA 

State Education Agency 


TAC 

Technical Assistance Center (Title I, 

ESEA) 

USOE 

U.S. Office of Education 
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A 

Federal Evaluation Activities in Education: 
An Overview 
Elizabeth R. Reisner 


Federal funds support a broad range of program evaluation 
activities in education. Such activities range from 
national studies involving achievement testing of 
thousands of students to local assessments of federally 
supported projects in individual school districts. 

This paper is intended to provide an overview of those 
federal evaluation activities that are designed to yield 
information on federal education assistance programs. 

The first section of this paper describes the major 
evaluation activities of each of the organizational units 
making up the former Education Division of the former 
U.S. Department of Health, Education, and Welfare (HEW) 
and certain other units. Taken together, these units 
constitute the main offices currently conducting 
evaluation activities in, the U.S. Department of Education 
(ED) . Information on evaluation activities of these 
offices is presented in tabular form and contains (1) a 
listing of the major federal education programs being 
evaluated by each of the organizational units sponsoring 
education evaluations, (2) a description of each unit's 
principal evaluation objectives, and (3) a rough estimate 
of the fiscal 1980 funds used for evaluation by each of 
the units. 


The author is a senior policy analyst with NTS Research 
Corporation in Washington, D.C. Previously, she had 
staff responsibility for the review of evaluation 
planning in the Office of Education. 
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by state and local agencies. The third section describes 
the evolution of the federal role in the evaluation of 
education programs. The final section describes the 
process used for deciding what national studies of 
federal education programs are conducted and what 
questions those studies address. 

Information for this study was collected in interviews 
with federal managers whose offices are responsible for 
conducting program evaluations in education as well as 
from the works listed in the references. In several 
instances internal memoranda of HEW/ the Office of 
Education (OE), and ED were used as source materials. 
Because the intent of the paper is to present a broad 
overview of the topic, it has been necessary to summarize 
detailed information in a number of cases; the author 
accepts full responsibility for any unintentional errors 
of fact or emphasis that may have occurred in preparing 
the summaries. 

Authority for the Department of Education was enacted 
on October 17, 1979, as P.L. 96-88, the Department of 
Education Organization Act. The act permitted a 6-month 
implementation period prior to official start-up of the 
new department. ED was officially inaugurated on May 4, 
1980. In this paper, policies and procedures in effect 
prior to that date are described using the earlier 
organizational terminology (e.g., OE and the 
Commissioner). Current terminology (e.g., ED and the 
Secretary) is used to describe activities occurring after 
May 4, 1980. 


MAJOR EVALUATION ACTIVITIES 
OP THE HEW EDUCATION DIVISION 

Table A-l provides summary descriptions of federally 
supported evaluation activities designed to provide 
information relevant to programs administered by the 
former HEW Education Division. The primary offices 
within the Education Division were OE, the National 
Institute of Education (NIE), and the National Center for 
Education Statistics (NCES); these offices are now 
organizationally situated within ED. The information in 
Table A-l pertains primarily to former Education Division 
offices because these are the offices for which 
comparable information was most readily available. 



with data on activities of the U.S. General Accounting" 
Office. Although data in the table were compiled in May 
1980, there have not been major changes in the use of 
fiscal 1980 funds. 

A broad, inclusive definition of program evaluation 
was used in compiling the data presented in Table A-l. 

It is adapted from the definition used by Robert Boruch 
in his proposal to OE to conduct a study of federally 
supported education evaluations at state and local levels 
(discussed in the second section of this report). 

Boruch's definition, which is consonant with that used by 
the Committee (see Chapter 2) , includes the following 
activities under the heading of program evaluation: 
needs assessments, surveys, and other assessments 
conducted prior to program initiation or review; process, 
or formative, assessments intended to yield descriptive 
information on the composition, organization, or 
activities of a program; outcome, or summative, 
assessments intended to yield information on the relative 
benefits, costs, and other effects of a program; and 
cost/benefit analyses intended to draw together 
information on several types of program effects. 

The category headings used in Table A-l are as follows 

• "Federal office conducting evaluation activities" 
refers to offices implementing evaluations (for in-house 
efforts) and offices overseeing evaluation contracts (for 
contracted studies). The organizational headings do not 
necessarily reflect offices of equal bureaucratic rank. 

• "Programs being evaluated" refers to the 
principal federal programs that are being studied. 

• "Main evaluation objectives" reflects the 
priorities as described by federal evaluation managers in 
interviews for this project and in written statements 
prepared as part of the HEW evaluation planning process. 
The information in the table does not include federally 
supported evaluations conducted by local projects for 
purposes of either self-assessment or fulfillment of 
federal program requirements. 

• "Federal funds used for evaluation in fiscal 
1980" comprises estimates reported by evaluation managers 
and described in internal planning papers. Funds used in 
fiscal 1980 are indicated because that is the most recent 
year for which fairly precise estimates are available. 










( continued ) 
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of the offices indicated. 

Among the categories of information presented in Table 
A-l, the category most vulnerable to change is the annual 
funding data. These amounts are subject to considerable 
fluctuation within any given year because of decisions to 
move funds into or out of accounts previously designated 
for evaluations and because of different interpretations 
as to whether a given project is an evaluation or a 
research activity. An example of the first type of 
fluctuation was the decision early in fiscal 1980 to 
transfer funds out of the "line item" appropriation for 
studies and evaluation of bilingual education programs in 
order to fund a bilingual television project. A total of 
$700,000 in OE funds for federal program administration 
was designated to be used to replace the transferred sum, 
but because of high expenses associated with implementing 
the new ED, the bilingual evaluation funds were not 
replaced. An example of the second type of fluctuation 
can be seen in NIE's reports of its own program 
expenditures. Because of an administrative decision to 
allot the maximum amount of NIE's funding to research 
purposes, the Institute intentionally labels very few of 
its projects as evaluations, even though many have 
characteristics that conform to the definition presented 
above. 

The aspect of the table most likely to provoke 
questions from readers is the inclusion of federally 
conducted audits of federal, state, and local 
implementation of federal programs. Audits are generally 
not considered to be evaluative in nature, especially 
since they usually focus only on the fiscal operations of 
individual federally funded projects. In recent years, 
however, federal audits have become increasingly 
concerned with nonfiscal matters, particularly state and 
local compliance with legislated objectives and 
procedures. The adoption of this auditing focus has 
resulted, in some instances, in a blurring of the 
distinction between audits and evaluations, particularly 
given the establishment of specified national priorities 
for federal education audits. For example, the fiscal 
1980 work plan for the HEW Office of the Inspector 
General identified three priorities for audits of state 
and local administration of Title I of the Elementary and 
Secondary Education Act (ESEA): (1) compliance with the 


Title I state requirements for monitoring and enforcement 
plans; and (3) operations of the centralized Migrant 
Student Record Transfer Service funded under Title I. 

With the establishment of explicit compliance-oriented 
auditing objectives such as these, federally conducted 
audits have acquired a distinct resemblance to program 
evaluations. 


FEDERALLY SUPPORTED EVALUATIONS 
CONDUCTED BY STATE AND LOCAL AGENCIES 

Virtually all federal education aid programs require 
institutional grantees to conduct evaluations of their 
own performance. The specific language of the 
legislative requirements varies among programs, depending 
on the overall objectives of the program and also on the 
evaluation methodologies considered by federal 
administrators to be best suited to the particular 
program. For programs with a large state administrative 
role, such as ESEA Title I and the state grant program 
under the Vocational Education Act, states are also 
required either (1) to collect local evaluation data and 
provide summaries of these data to ED on a regular basis 
or (2) to carry out their own state-managed evaluation 
efforts. 

In recent years congressional mandates and Education 
Division program managers have identified state and local 
evaluaton priorities with increasing specificity, but the 
offices of the former Education Division do not at 
present collect regular data on the implementation of 
state and local evaluation requirements. Therefore, it 
is not possible to determine what portion of ED program 
grant funds are used by grantees for self-evaluation 
purposes nor is it possible to determine exactly how 
those funds are used. It is apparent, however, that 
significant amounts of federal funds are used to provide 
assistance to state and local agencies in improving the 
quality of their evaluations. 

Evaluations conducted by state and local agencies are 
generally funded using program grant funds. At the state 
level, evaluation activities are supported using state 
administrative funding provided by the pertinent federal 
program. ESEA Title I, for example, provides each state 
educational agency with 1.5 percent of the state's total 



school year 1979-80, amounts available for Title I state 
administrative activities, including evaluation, ranged 
from $4.5 million in New York to $225,000 in the 14 
states with the lowest Title I enrollments. Other 
federal education programs also provide administrative 
funding to state education agencies. 

At the local level, evaluation activities must be 
supported out of each school district's federal grant 
funds. The district's grant application usually 
describes the evaluation activities planned by the 
district and indicates how much of its grant is proposed 
to be used for evaluation purposes. That proposal is not 
generally binding on the district, however, once the 
federal grant is received. (For more detail on the 
funding and management of local evaluation activities, 
see Appendix C.) Examples of the types of state and 
local evaluation activities carried out under three 
federal education programs are described below. 


ESEA Title I 

As a result of a requirement contained in the Education 
Amendments of 1974 (P.L. 93-380), OE developed a set of 
local evaluation models for use by Title I grantees. The 
models, as specified in federal regulations (45 CFR 116.7 
and 116a.50-57 published in the Federal Register on 
October 12, 1979), provide methods for measuring student 
achievement gains in reading, mathematics, and language 
arts. ED (and formerly OE) also provides technical 
assistance (at a cost of $11 million in fiscal 1980) to 
state education agencies on methods for assisting local 
districts in the use of the models. Despite extensive 
efforts by OE since 1974 in designing and implementing 
the models, Congress has expressed concern in committee 
reports for the Education Amendments of 1978 (P.L. 

95-561) that the Title I evaluation models do not yield 
data that can be used by local Title I administrators as 
a basis for improving Title I projects (U.S. Congress 
1978a:51, and U.S. Congress 1978b:29-30). Findings in 
support of this view have also been presented by David 
(1980) and Orland (1980), but they are contradicted by 
statements of the ED evaluation office. 



The Education Amendments of 1974 also mandated that 
evaluation models be developed for use by local districts 
receiving funds under ESEA Title VII, the Bilingual 
Education Act. The Education Division did not 
immediately implement that mandate, however, and it was 
reiterated in the Education Amendments of 1978. The 
Senate committee report on the 1978 amendments expressed 
hope "that these guidelines will provide scientifically 
valid information as well as describe the unique features 
of each project in order that local level projects can be 
validly compared" (U.S. Congress 1978b:69) . The ED 
evaluation office is currently overseeing a project 
intended to yield evaluation models for use by Title VII 
grantees. In early descriptions of the project, the 
evaluation office has stated that the models are to be 
designed on the basis of existing approaches (including 
the current Title I evaluation models) and are not to 
reflect any new or "basic research." 

As in Title I, the Title VII program also funds 
technical assistance providers who are expected to assist 
local Title VII grantees in improving the quality of 
their self-evaluations. Until the evaluation models are 
ready, however, grantees and assistance providers have 
relatively little guidance on which to base local 
evaluation efforts, except for criteria in the Title VII 
final regulations requiring attention to "data collection 
instruments and methods," "data analysis procedures," 

"time schedules," and the like (45 CFR 123a.30(e) 
published in the Federal Register on April 4, 1980). 


Vocational Education Act 

The Education Amendments of 1976 (P.L. 94-482) 
established a comparable set of requirements for the 
Vocational Education Act. States are required to use 
"statistically valid sampling techniques" to measure "the 
extent to which program completers and leavers (i) find 
employment in occupations related to their training, and 
(ii) are considered by their employers to be well-trained 
and prepared for employment" (Section 112 (b) (1) (B) of 
the Vocational Education Act), in addition, the 
legislatively mandated "national center for research in 
vocational education" is to "work with states, local 
educational agencies, and other public agencies in 



required by Section 112, so that these agencies can offer 
job training programs which are more closely related to 
the types of jobs available in their communities, 
regions, and states..." (Section 171(a)(2) of the Act). 
The national center at Ohio State University has prepared 
materials relevant to their technical assistance role; a 
recent list of their activities includes three projects 
aimed at implementing this mandate: "Evaluation Services 
for Education Agencies," "Evaluation Handbooks," and 
"Inreasing the Credibility of Vocational Education 
Evaluations" (listed in Gordon et al. 1979:62-63, 153). 
The NIE mandated study of vocational education is 
currently examining the performance of states in 
implementing their evaluation requirements. 


Studies of State and Local Evaluation Activities 

Despite these extensive statutory mandates for state and 
local evaluations, the only effort up to now to review 
federally supported state and local evaluations across 
federal programs has been the recent study by Boruch and 
Cordray (1980). That study provides information on those 
state and local evaluation activities aimed at producing 
data relevant to federal categorical programs. There are 
also three studies (one of which is under way now) that 
provide information on state and local evaluation 
activities supported from a variety of sources, federal 
and nonfederal. 

Survey of large school district evaluation units . The 
Center for the Study of Evaluation at the University of 
California at Los Angeles has examined the organization 
of local school district offices of evaluation. This 
survey acquired data on the size, staffing, and 
organizational structure of evaluation offices in school 
districts with enrollments over 10,000 (Lyon et al. 1978). 

Survey of educational researchers and research 
organizations . Under contract with NIE, the Bureau of 
Social Science Research in 1976-78 surveyed nonfederal 
organizations conducting research, development, 
dissemination, and evaluation activities in education. 
Information was obtained on funding, organizational 
characteristics, and activities of 2,434 such entities 
(Frankel et al. 1979) (see Appendix B). 


NIE contract to the Huron Institute in Cambridge, 
Massachusetts, this study is intended to develop 
strategies for helping school districts make better use 
of evaluation and test information. Initial reports from 
the study were made available in the fall of 1980; the 
final report is to be issued in the fall of 1981. 

Although each of these studies sheds light on state 
and local evaluation activities in education, none 
provides a comprehensive description of state and local 
evaluations undertaken to assess the operations of 
federal programs. 


EVOLUTION OF THE FEDERAL ROLE 
IN THE EVALUATION OF EDUCATION PROGRAMS 

Evaluation requirements are a relatively recent addition 
to federal education programs. The first mandatory 
evaluations for an OE program were those carried out by 
local districts implementing ESEA Title I projects. In 
1965 Senator Robert Kennedy introduced language into the 
draft version of Title I requiring that "effective 
procedures, including provision for appropriate objective 
measurements of educational achievement, will be adopted 
for evaluating at least annually the effectiveness of the 
programs in meeting the special educational needs of 
educationally deprived children" (Section 205 (a)(5), 

P.L. 89-10). Over the next several years local 
evaluation requirements were added to other OE program 
authorities, and by 1970 several OE bureaus had 
designated evaluation coordinators whose role was to 
oversee local evaluation efforts and occasionally to 
conduct small studies at the national level, usually 
relying on OE general administrative funds (under the 
"Salaries and Expenses" account in the annual 
appropriation) for financial support of any contracted 
projects. 

The fiscal 1970 appropriation for OE contained for the 
first time, however, a $9.5 million line item for OE 
evaluation and planning activities. Also in that year 
John W. Evans was named to head the first OE-wide 
evaluation office to oversee the expenditure of those 
funds. To administer a centralized evaluation and 
planning function, Evans assembled an evaluation staff, 
composed largely of the evaluation coordinators who had 



then been sources of bureau-level evaluation support. 

After that beginning, the activities of the evaluation 
office grew steadily for the next several years. 

With the legislative creation of NIE in 1972, the 
organizational structure for OE studies of eduation 
programs was altered somewhat, with a few exceptions, 
those OE functions that were primarily research oriented 
were transferred to the new agency. Notable exceptions 
were the research activities carried out as an adjunct to 
the OE program for the education of handicapped 
children.- 1 - The director of the program argued that the 
research activities for the education of the handicapped 
were so closely related to state and local program 
support activities that handicapped research should not 
be moved to NIE. The OE handicapped office was 
successful in this argument and thus paved the way for 
the 1975 legislative directive in the Education of All 
Handicapped Children Act (P.L. 94-142) that the major 
national evaluation activities required in the Act were 
to be administered by the OE Bureau of Education of the 
Handicapped (BEH) and not in the central OE evaluation 
office. 

The move towards decentralization of evaluation 
functions was underscored by language specifying that the 
new national center for research in vocational education 
was to be lodged in OE. This action had implications for 
OE evaluations because the research center was given 
specific responsibilities for developing evaluation 
methods and assisting state and local agencies in 
implementing program evaluations. In the trend towards 
decentralization of evaluation activities, it was equally 
important that Congress specified in the vocational 
education statute (Section 160 (a)(1)) that "the 
administration of all the programs administered by this 
Act" was to be the responsibility of the Bureau of 
Occupational and Adult Education (BOAE). Thus, the 
management of the national vocational research center and 
its mandated evaluation activities were explicitly 
assigned to the OE operating bureau, not to the central 
evaluation office or to NIE. 

The most recent step in this trend has been the shift 
of responsibility for evaluation of the Follow Through 
program. As a result of a short-term "exploratory" 
evaluation of the program, the OE Commissioner in 1979 
decided to move Follow Through evaluation activities from 



program staff for the stated purpose of making the Follow 
Through studies more relevant to program operations. 
Undoubtedly/ another factor was displeasure of the staff 
with a recent large evaluation of the impact of Follow 
Through services on student development, reflecting a 
, frequent pattern of program office/evaluation office 
tension (noted in the final section of this paper). 

In addition to the handicapped, vocational, and Follow 
Through evaluation activities, OE's evaluation function 
had been decentralized in several other ways, even before 
the new ED was created. The evaluation office, for 
example, has invited the participation of program 
managers in all major decisions affecting evaluations in 
their respective program areas. The evaluation planning 
process, described in the following section, relies 
< heavily on the judgments and recommendations of program 
managers. The importance of this consultation is in some 
senses highlighted by the increase in statutory 
set-asides of annual program appropriations for national 
evaluations. The Emergency School Aid Act of 1972 (P.L. 
92-318) specified a set-aside of up to 1 percent of 
annual appropriations for national program evaluations. 
Two years later, the 1974 reauthorization of ESEA Title I 
authorized up to one-half of 1 percent of annual Title I 
appropriations for program evaluation and studies. In a 
slightly different pattern, the 1974 reauthorization of 
ESEA Title VII established a new "Part C - Supportive 
Services and Activities" to be administered by the HEW 
Assistant Secretary for Education. The 1978 amendments 
to Part C authorized studies that are clearly evaluative 
. in nature, including studies of Title VII effects on 
students with language proficiencies other than English 
and of methods for identifying students to be served by 
Title VII projects. Because the statute assigned 
administrative authority for Part C to the HEW Assistant 
j Secretary for Education, the OE evaluation office was 
only one of four offices that has in the past several 
■ years reviewed plans for bilingual activities; the other 
! offices have been the OE Office of Bilingual Education, 

: HIE (since it is given specific statutory 
• responsibilities under Part C), and NCES (since it 
; conducts statistical studies supporting Title VII) . 

; Under the new Department of Education, the Part C 
coordinating function is being carried out by the Office 
; of Bilingual Education and Minority Language Affairs. 




One of the most difficult problems affecting program 
evaluation efforts in the Education Division and in ED 
has been determining the best way to identify program 
evaluation needs. 2 The problem is largely one of 
organization. Program managers need to be consulted 
regarding any studies to be done in their respective 
program areas, and in fact the ED evaluation office has 
been consistently careful to ask for the suggestions of 
program managers. Program managers and evaluation 
managers often disagree, however, with regard to 
evaluation priorities for a given program. Program 
managers are more likely to ask for evaluation studies 
that will help them improve existing management tools or 
will enlarge their information about their program 
operations; evaluators tend to be more concerned with 
whether or not a program is effectively meeting a 
longer-range objective, such as the improvement of 
academic achievement (or college enrollment rates or 
English proficiency) for a defined group of students. 
Program managers may not place a high priority on 
evaluations of program effectiveness because they believe 
that first-order questions (e.g., "Are the intended 
children receiving the intended program service?") should 
be answered first or because they fear the consequences 
of unfavorable answers to program effectiveness 
questions. In addition to this disagreement over the 
purposes of evaluations, another organizational problem 
is that senior-level program managers often simply are 
not willing to take the time to consider evaluation 
priorities at the time that decisions must be made. 

The OE, now ED, evaluation office has addressed this 
need for program consultation by seeking formal 
suggestions for evaluations from program managers once a 
year. Through 1978 the strategy was to issue an annual 
request for project recommendations from program managers 
and then to use those recommendations as one factor in 
developing a list of projects to be undertaken in the 
following year. This list was then submitted to the HEW 
Assistant Secretary for Planning and Evaluation (ASPE) 
for final approval. The amount of scrutiny by ASPE 
varied from year to year; generally only the central 
evaluation unit's plans were subjected to critical review 
even though other units, such as NIE and NCES, also 
submitted their plans. 



was intended to make plans more responsive to concerns of 
Congress and senior HEW and ED policy makers. The main 
foci of this attention were the proposals of the OE, and 
then ED, central evaluation office, but the senior-level 
review group convened for the purpose also reviewed 
fiscal 1979 evaluation plans prepared by BEH, BOAE, and 
the Bureau of Student Financial Assistance (BSFA). The 
plans of the central evaluation office, which received by 
far the major portion of the group's time and concern, 
were criticized and modified by the group primarily with 
regard to the proposed timing of studies and their 
expected cost; in a few instances plans for impact 
studies were delayed by the group and program needs 
assessment projects were suggested to precede impact 
studies. The group's primary objective with regard to 
timing was that new evaluation studies should be 
scheduled to provide useful program data in time to make 
substantive contributions to legislative debates on 
program reauthorization. Cost considerations entered the 
decisions to reduce the scope of tasks proposed in 
certain studies and to eliminate some tasks from other 
studies. Preliminary studies of program need were 
recommended in instances in which policy questions 
existed about the national need for the type of services 
to be provided by the program under review. The new 
review procedure was also used for 1980. The resulting 
evaluation plan marked the first time that a 
comprehensive OE-wide plan had been assembled. 

An example of the new review procedure in action was 
the group's decision on the proposal of the evaluation 
office to examine the effectiveness of the developing 
institutions program, Title III of the Higher Education 
Act (P.L. 92-318, amended by P.L. 94-482). In that 
action the group decided that it was premature to 
consider the effectiveness of the program in improving 
the financial and educational viability of the 
institutions being funded. The group decided that a 
necessary first step was to identify a set of reliable 
indicators to apply to the financial status of a college 
or university in order to determine the financial 
strength or weakness of the institution under review. It 
was also determined that an "exploratory evaluation" of 
the developing institutions program should be 
conducted.-* The purpose of the exploratory study would 
be to identify practical, usable measures of successful 



with a larger-scale study, which would—among other 
things—actually measure whether or not the developing 
institutions program was being fully implemented by 
institutions receiving awards under the program. 

Under ED Secretary Shirley Hufstedler, the 
organizational setting for program evaluation reflected 
the increased emphasis on linkages between evaluation and 
program improvement. The central evaluation office in ED 
reported organizationally to the Deputy Assistant 
Secretary for Evaluation and Program Management, who in 
turn reported to the Assistant Secretary for Management. 
The Program Evaluation Office was organizationally 
coequal to the Management Evaluation Office, which was 
assigned responsibility for management evaluation, 
management quality assurance, program assessment, and 
organizational development, in his statement before the 
Senate Human Resources Committee prior to confirmation, 
John Gabusi, Secretary Hufstedler's Assistant Secretary 
for Management, expressed his intent to improve the use 
and usefulness of ED evaluations for purposes of 
management improvement in ED programs, decisions on 
program budgets, and fulfilling information needs of 
Congress prior to legislative reviews. 

Gabusi's statements and the structure within which the 
program evaluation function was organizationally housed 
at that time reflect to a considerable extent the 
priorities expressed in Circular No. A-117 issued by the 
U.S. Office of Management and Budget in March 1979. 
Entitled "Management Improvement and the Use of 
Evaluation in the Executive Branch," this directive to 
federal agencies construes program evaluation as a 
component of federal management improvement. As stated 
in the circular, "agency evaluation systems . . . should 
focus on program operations and results. They should 
include procedures to assure that evaluation efforts 
result in specific management improvements that can be 
validated" (page 2) . The organizational structure under 
Secretary Hufstedler reflected these priorities and may 
have indicated the direction of upcoming ED evaluation 
activity. No information is available at this writing, 
however, on the program evaluation plans of Terrel Bell, 
Hufstedler's successor as Secretary of Education. 

The evaluation of federal education programs has 
undergone considerable change in the 10 years in which it 
has been a major federal activity. These changes have 


we have seen the federal evaluation function centralized 
into a single agency-wide unit and then gradually 
decentralized to some degree. The creation of the 
Education Department may be quickening the pace of change 
that characterizes this process. Given these 
circumstances, it is essential that the direction and 
character of federal education evaluations be informed by 
expert, dispassionate analysis of possible methods for 
increasing the utility of federal evaluations as a tool 
for improving education. 


NOTES 

1 A second important exception was the policy research 
activities carried out by the Education Policy 
Research Centers. At the recommendation of Evans in 
1972, those ceners (three in number at that time) were 
moved from the OE evaluation office-to the newly 
created Office of the Assistant Secretary for 
Education in order to support that office's activities 
in education policy development. 

2 A similarly difficult issue has been the utilization 
of evaluation findings. This issue is addressed in 
Boruch and Cordray (1980) and in the report of the 
Committee. 

3 Such studies were also undertaken in a number of other 
program areas at the instigation of Joseph Wholey, 

ASPE Deputy Assistant Secretary, who had developed the 
notion of exploring the "evaluability" of a program 
before full evaluations were done. 
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APPENDIX 


B 

Performers of Federally Funded 
Evaluation Studies 
Laure M. Sharp 


INTRODUCTION AND DATA BASE 

The evaluation of federally funded social initiatives in 
education—as in health services, crime control, or 
housing programs—is seldom carried out by federal 
agencies. The bulk of evaluation performers are private 
research firms, academic bureaus, and state and local 
agencies, which receive federal funds to conduct 
evaluations commissioned by congressional mandate or by 
executive policy makers or to carry out evaluations on 
their own initiative with federal support. Although much 
has been written on evaluation methodology and quality, 
on one hand, and on the uses and abuses of the grant and 
contract system under which federal funds are channeled 
to outside performers, on the other, there is no single 
useful data base that provides figures on federal funds 
spent in a given fiscal year on evaluation activities, 
the portion of such funds allocated to outside 
contractors or grantees, and the identification of 
contract and grant recipients. 

Evaluations in the field of education represent a 
large share of all federally funded evaluation 
activities, probably on the order of one-fifth or 
one-fourth of those activities. 1 More specific 
information exists with respect to the performers of 
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however, information is not nearly as extensive and 
reliable as one would need for a comprehensive 
assessment. The procedure of piecing together relevant 
information from various sources is subject to a high 
degree of imprecision for several reasons: 

• There is no commonly accepted definition of 
evaluation activity, in particular, the boundaries 
between evaluation and research are far from clear-cut, 
as discussed by Reisner in Appendix A and by Abramson 
(1978) in his work on federal funding of social reserch 
and related activities. Evaluation performers themselves 
are even more inconsistent with respect to these 
boundaries. 

• The data that are available seldom refer to the 
identical time span. Yet the volume and nature of 
federally funded evaluation activities in education have 
varied considerably over the time period (1974-79) 
considered in this paper. 

• While evaluation studies commissioned by federal 
agencies have been increasingly funded in the form of 
contracts awarded through the competitive procurement 
process, work in the evaluation area is also awarded in 
the form of grants and "sole-source'' awards. In 
addition, existing contracts and grants are often 
extended and modified, frequently with the addition of 
new funds. Information about these types of funding 
activities is difficult to locate. 

• The prevailing revenue-sharing model under which 
large funds are allocated to state and local 
jurisdictions on a discretionary basis makes it almost 
impossible to estimate the level of evaluation activities 
carried out by these jurisdictions. In particular, 
systematic documentation is lacking about the extent to 
which such activities are performed by staffs of state 
and local education agencies or under grant and contract 
arrangements by outside organizations. While there is 
some discussion in this paper of the evaluation 
activities of state and local education agencies, data 
presented for those sectors should be viewed as 
especially rough estimates. 

• While many contracts or grants may be awarded for 
the exclusive purpose of conducting an evaluation, there 
are probably many more instances where evaluation is 
merely one component of a project. This is especially 
true of social experiments and demonstration programs. 



National Institute of Education. To create a listing of 
potential performing organizations, a variety of sources 
was used, including rosters of state departments of 
education, intermediate education agencies, local school 
systems, federal grantees and contractors, and authors of 
articles in 82 pertinent journals. The ARROE project 
initially identified more than 6,300 organizations that 
might meet the criteria for inclusion in the survey, and 
a questionnaire was mailed to each organization. 
Organizations that had been active performers during 
their last completed fiscal year and were distinct 
organizational entities were considered eligible for the 
survey and were asked to complete the entire 
questionnaire. Organizations that failed to respond were 
contacted by telephone, and, if eligible, were asked a 
number of key questions. Of the 6,346 organizations on 
the original mailing list, 81 percent were contacted and 
their eligibility established. Of the 5,208 
organizations with whom contact was made, data from just 
about half (2,434) were included in the data analysis; 
most of the others were ineligible, frequently because 
they had not carried out educational RDD&E during their 
most recent fiscal year. (The derivation of the ARROE 
data base is sketched out in Table B-l.) Slightly less 
than half of the reporting units had returned the 
detailed mail questionnaires, while slightly more than 
half of the units were asked the abbreviated set of 
questions in a telephone interview. Thus, the ARROE 
survey yielded two data sets: a basic set for all 
organizations for whom some data was obtained (N = 2,434) 
and a more detailed set (N » 1,071) limited to those 
organizations that completed mail questionnaires. 2 The 
2,434 performing organizations covered by the survey were 
located in 1,530 separate institutions (see Table B-2). 3 

While evaluation was one of the activity areas covered 
by the ARROE survey, it was not its primary focus. The 
ARROE staff—in consultation with an advisory committee 
on which the principal types of performers were 
represented—came to the conclusion that in fact most 
organizations that perform research and research-related 
activities would find it difficult to differentiate 
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:ion defunct; c 
refusals; not 




Number of 
Separate 
Organizations 
Identified 

Number of 
Institutions 
in Which These 
Organizations 
Were Located 

Types of 
Institutions 

Public 

education 

agencies 

688 

631 

37 State educa¬ 
tion agencies 
193 Intermediate 
service 
agencies 

401 Local educa¬ 
tion agencies 

Academic 

1,268 

423 

Public and private 
junior colleges, 
4-year colleges, 
universities, and 
their divisions? 
educational R&D 
centers 

All others 

478 

476 

Private nonprofit 
and for-profit 
organizations and 
noninstructional 
governmental agen¬ 
cies; independent 
education R&D 
laboratories 


between types of functions in funding, expenditures, and 
staffing. This was believed to be the case especially 
with respect to basic versus applied research, but also 
for research versus evaluation and policy studies. 

The definition of evaluation studies also posed a 
problem. The ARROE staff and their advisors saw the need 
for a fairly restrictive definition, given the propensity 
of some respondents, especially those in public education 
agencies, to include under the heading of evaluation the 
compilation and reporting of periodic or routine 
statistics and information. For this reason, ARROE 
labelled the relevant category "evaluation and policy 
studies," which was defined as: "systematic inquiries 
specifically addressed to policymakers and intended to 



determination of the feasibility of new programs and 
projects, and studies focusing on needs, goals, and 
priorities of action regarding ongoing or contemplated 
activities." Thus, ARROE's definition of evaluation 
activities differs to some extent from those used by 
other investigators and especially by the Committee. 

With respect to the latter, ARROE's definition is both 
more restrictive, because it specifies policy makers as 
the audience, and broader, because it specifically 
includes policy studies. 

Using the ARROE definition, several questions about 
evaluation activities were included in the mail 
questionnaire. Respondents were asked to estimate what 
percentage of their education research, development, 
dissemination, and evaluation (RDD&E) expenditures were 
used primarily for evaluation and policy studies and how 
many full-time and part-time professionals spent the 
greatest percentage of their working hours performing 
evaluation and policy studies. "Project and program 
evaluation" was also listed as one of more than 50 
problem areas among which respondents could select those 
to which their organizational activities were primarily 
directed. 

The discussion on the following pages is based on 
these data and on related analyses of the ARROE data"base 
(Frankel 1979, Frankel et al. 1979, Lehming 1979, Sharp 
1979, Sharp and Frankel 1979). I believe that this 
discussion is helpful in providing a rough picture of the 
performer universe and especially of those organizations 
that are most active in what is sometimes called the 
evaluation industry. It would be foolhardy to claim a 
high degree of precision for the numbers presented 
here—given such problems as missing data, reluctance on 
part of some performers to respond in detail to questions 
on financial affairs and on staffing, and possible 
respondent misinterpretation or distortion. 

Nevertheless, there is enough consistency within the data 
set and enough congruence between the ARROE-based 
findings and those of other investigators to provide 
reasonable confidence about the general trends portrayed 
by the data. 



asis of the ARROE data, I estimate that about 
lion in federal funds were spent for education 
on in 1977 by extramural performers. These 
s are based on three calculations. First, data 
percent of the 2,434 eligible ARROE respondents 
ggregate total expenditures for all education 
and research-related activities of $735 
Adjusting this number for the 20-percent 
use, I estimate total RDD&E expenditures by 
lal research performers in 1977 at $900 million. 

3ata from a subset of respondents {864 
tions that completed all relevant items on the 
mail questionnaire and reported actual 
ires of $355 million) showed that approximately 
it of all RDD&E expenditures were devoted to 
in and policy studies (see Table B-3) . Applying 
portion to the total ARROE population, I estimate 
il expenditures for evaluation and policy studies 
:ion were approximately $200 million. 4 Third, 

Lf of all reported RDD&E expenditures in 1977 
i federal sources. This proportion may be a 
:ive estimate for evaluation given the 
istics of the principal performers (which is 
1 below). 

I estimate that in 1977, extramural performers 
least $100 million for federally funded 
>n and policy studies. This figure is 
ibly higher than one would derive for 1979 using 
i data in Appendix A, and it is also much higher 
. derived from an available inventory of 
.ve contracts awarded by the education agencies 
ir fiscal 1977 (Kooi et al. 1978) ; see Table 
ertheless I am reasonably confident that the 
y be a valid order-of-magnitude estimate for 

several reasons: more funding was available in _ 

in 1979 (see Table B-4); Reisner's data do not 
xpenditures by public education agencies (SEAs 
, which accounted for a sizable proportion of t 

expended; Kooi's data do not reflect grants and ^ 

ce awards, nor do they include continuing work 
contracts and grants awarded in earlier years, 
supplements made through contract 
ions, while the ARROE study did include funds 
nuations and supplements; the ARROE study also 
performers who received funds from agencies 
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Functional Distribution of Evaluation Expenditures by Sector 
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$782,000 not 








1977 a 

1919 b 

agencies 
.c institutions 
s (profit or nonprofit) 

199,000 

5,326,654 

45,000 

38,238+ 

2,664,613 


$5,525,654 

$2,747,851 


from Kooi et al. (1978) . 

ninary data from Kooi et al. (In press), made available to 


than HEW (for example, from DOL or NIH) for work 
iould be classified as education RDD&E; and 
fication differences—in particular the inclusion 
icy studies—may have inflated the evaluation 
tes for ARROE. 


LECTED CHARACTERISTICS OF PERFORMING ORGANIZATIONS 

t re the performers of evaluation work in 1977 and 
»re federal funds for evaluation distributed among 
is sectors of the performer community? For analytic 
ses, ARROE classified the performer community into 
major segments: the public education sector, which 
ied state education agencies (SEAs), intermediate 
:e agencies (ISAs) , and local education agencies 
enrollment was 10,000 or more, which in turn were 
rided into large LEAs (with enrollments of 50,000 or 
and small LEAs (with enrollments of 10,000-49,000); 
:ademic sector, which included public and private 
=ar and four-year colleges, universities, and their 
visions, such as R&D centers, specialized 
Lutes, and survey units; and a residual sector, 
was largely composed of profit and not-for-profit 
rch and development organizations and educational 
atories, but also included hospitals, publishers, 
ations, associations, and noneducational agencies of 
and local governments, such as health and manpower 
ies. 

shown in Table B-5, academic organizations 
sent the largest single group of performers of 
tional research and related activities, followed by 




All RDD&E 


Evaluation 


Sector 

(N) 

$ 

Thousands 

Percent 

$ 

Thousands 

Percent 

Private 

Profit 

(22) 

31,208 

8.8 

17,094 

21.5 

Other 

(131) 

95,277 

27.1 

20,151 

25.3 

Total 

(153) 

126,485 

35.9 

37,245 

46.8 

Academic 

(474) 

147,086 

41.4 

16,911 

21.2 

Public 

LEA—small 

(109) 

11,433 

3.2 

3,870 

4.8 

LEA—large 

(34) 

20,464 

5.8 

9,953 

12.5 

ISA 

(55) 

12,896 

3.6 

3,778 

4.7 

SEA 

(36) 

35,344 

9.8 

7,873 

9.9 

Total 

(234) 

80,137 

22.4 

25,474 

32.0 

TOTAL 

(864) 

354,490 a 

100.0 

79,645-k 

100.0 


a Includes $782,000 not identified by sector. 
^Includes $15,000 not identified by sector. 
SOURCE: ARRQE mail respondents only. 


those in the private sector. Public education agencies 
accounted for less than one-fourth of all RDD&E 
expenditures. 5 With respect to evaluation, however, 
the picture is very different. Organizations in the 
private sector were in first place, followed by public 
education agencies, and academic performers had the 
smallest share. Furthermore, as shown in Tables B-6 and 
B-7, only in two types of organizations—private 
for-profit and local school systems—is there a 
concentration of organizations that spent more than 
$100,000 on evaluation in 1977 or devoted most of their 
resources (50 percent or more) to evaluation activities. 
The data clearly suggest that evaluation is a marginal 
activity for most academic performers, while it plays a 
major role in sustaining most for-profit organizations. 
However, given the actual numbers of performers involved, 
one should not conclude that most large evaluation 
dollars were spent by private for-profit organizations in 
1977: 5 for-profit organizations spent in excess of 

$500,000 for evaluation compared with 12 not-for-profit 
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Sector 

0 

1-24 

25-50 

50+ 

Private 

For profit 

19.2 

15.4 

26.9 

38.5 

All other 

23.7 

43.9 

18.0 

14.4 

Academic 

34.9 

41.4 

11.6 

12.0 

Public 

SEA 

14.0 

51.2 

25.6 

9.3 

ISA 

18.6 

52.5 

15.3 

13.6 

LEA—small 

8.8 

24.6 

26.3 

40.4 

LEA—large 

5.4 

13.5 

37.8 

43.2 


SOURCE: ARROE mail respondents only. N 

d 

organizations and 4 academic organizations that spent at 

that amount. 

There are sharp differences among organizations in the as 

various sectors of the performing universe. The balance t"J 

of this section examines separately some salient features ar 

of evaluation performers in each of the three sectors. I» 

In; 

IS 

For-Profit and Not-for-Profit Organizations thi 

in the Private Sector 6 ex, 


What is sometimes referred to as the evaluation industry la! 

is a group of organizations—some profit, some hoi 

not-for-profit, some large, others quite modest—that are eai 

at present the most frequent performers of federally 
funded evaluations in the field of education. With the sei 

emergence and the predominance of the competitive o.i 

procurement system and the funding of evaluations under 
contracts rather than grants, organizations of this type pt; 

are apparently best able to mount the prodigious proposal pat 

writing efforts required for participation in the system re] 

and to muster and manage the resources necessary to carry fur 

out large-scale evaluation projects, often under severe aci 

time constraints. but 

Obviously, the ARROE data collection effort, since it hat 

was not targeted to performers of federally funded spr 

evaluation but sought instead to capture the universe of oti 

organizations that contributed to research, development, sps 

and evaluation in education in 1977, failed to isolate re< 



Se irtheless, some of the findings are instructive: 211 
of :he 478 organizations in the residual sector (i.e., 
af .liated neither with academic institutions nor with 
pu .ic education agencies) were classified as R&D 
or< nizations and thus constitute the universe of 
or' inizations potentially involved in the "evaluation 
in< stry" {see Table B-8) . Most of these 211 
or< inizations spent less than $1 million on all research 
anc research-related activities in 1977, regardless of 
soi ce of funding. The 77 organizations that spent $1 
oi] ion or more in 1977 include the federally funded 
edi ational laboratories (a group of not-for-profit 
ins itutions started with federal funding but now partly 
dei ndent on grant and contract work) and a number of 
not for-profit groups primarily oriented to the field of 
edi at ion or educational administration. The ARROE data 
are incomplete (about one-third of the respondents did 
not wish to disclose the information or have their names 
iss ciated with the information if they did disclose it) 
aut no more than 15 organizations were identified that 
ire nerabers of the "industry" as popularly conceived 
(Sy tern Development Corporation, Abt Associates, American 
:ns Ltutes for Research, Educational Testing Service 
[ET ) , etc.). Only three such organizations are among 
he LO private-sector organizations that reported 
xp iditures of more than $5 million for all education 
HD. 2; the other 7 organizations were educational 
ab ratories, not-for-profit education centers, and 
os] Ltals, presumably engaged in research centered on the 
du< ition of medical personnel. 7 

1 >re than other organizations, those in the private 
eel >r and especially the major performers depend heavily 
n : aderal funding for their activities. According to 
he iRROE study, 62 percent of the funding for the 
rii ate sector came from federal sources compared with 48 
sre int for the academic sector. Academic institutions 
si] to a greater extent on state and local government 
inc ng: 19 percent of education RDD&E work in the 
sa< imic sector was funded from state and local sources, 
it .nly 10 percent of the work in the private sector, 
ire ; private-sector organizations and organizations that 
>ec alize in education RDD&E in particular have few 
:he • sources of funding: half of the organizations that 
ten more than $1 million in 1975 for education RDD&E 
!C€ ved at least 75 percent of their funds from the 


Organizations Spending 
All Organizations $1 Million or More 



N 

Percent N 

Percent 

Education RDD&E 

155 

35 

26 

47 

Other RDD&E 

56 

13 

9 

16 

Non-RDD&E 

213 

48 

18 

33 

Health care 

50 

11 

3 

5 

Associations, 
labor unions 

35 

8 

3 

5 

Private schools 

24 

5 

— 

— 

Social science 

17 

4 

— 

— 

Child care 

16 

4 

— 

— 

All others 

71 

16 

l 2 b 

22 

Government agencies 

23 

5 

2 

4 

TOTAE 

447 C 

100 

55 d 

100 

^Includes government 

agencies 

other 

than public 

education 


agencies. 

^Publishing, Broadcasting—2. Management Consulting—2. Informa¬ 
tion Services—2. other—6. 
information not available for 31 cases. 

^Information not available for 3 cases. 

SOURCE: ARROE mail and telephone respondents. 


federal government, and one-fourth of them received at 
least 90 percent from the federal government. 

The ARROE data show that large performers 
(expenditures of $1 million or more) account for the bulk 
of all expenditures in education RDD&E in the private 
sector: while they are 18 percent of all organizations 
listed in ARROE, they accounted for 77 percent of all 
reported expenditures. For the subset of organization 
for which there are more detailed data, the picture was 
similar; furthermore, expenditures for evalution are even 
more heavily concentrated among major performers than are 
expenditures for all RDD&E (see Table B-9). But these 
performers do not fit the image of an industry whose only 
activity and source of revenue is the performance of 
evaluations in the field of education: federally funded 
evaluation work is concentrated in large organizations 
with diversified activities that encompass various 
topical areas (for example, the Rand Corporation, Abt 
Associates, and Applied Management Sciences) or several 
different research functions or activities in education 
(for example, ETS). 



Expenditures 
in 1977 


for Evaluation 
in 1977 


Percent Number Percent Number 


Private sector 


All organizations 

100.0 

354 

100.0 

153 

Major organizations 

79.6 

58 

82.7 

32 

All other organizations 
Academic sector 

20.4 

296 

17.3 

121 

All organizations 

100.0 

943 

100.0 

474 

Major organizations 

50.1 

92 

46.1 

39 

Minor organizations 

49.9 

851 

53.9 

435 


a Major performers are those who spent more than $1 million for 
all RDD&E activities in 1977; minor performers are all others. 
NOTE: "Total RDD&E Expenditures" column is based on both mail 
and telephone respondents. "Expenditures for Evaluation" column 
is based on mail respondents only. All cases with missing data 
were excluded. 


This is not to say that one or another organization 
may not have come into existence for the purpose of only 
such activities—or even for the purpose of performing a 
single contract with a given agency, a practice 
highlighted in a recent GAO report, especially with 
respect to former employees (U.S. General Accounting 
Office 1980). Small performers do carry out a fair 
amount of educational research and research—related work, 
and some may fit the image of the "beltway bandits" so 
prominently mentioned in all the periodic exposes of the 
research and contract world. It is also possible that 
such respondents were especially unlikely to return the 
ARROE questionnaire and were interviewed by telephone and 
so were underrepresented in the group from whom detailed 
information was obtained. However, the evidence 
indicates that the bulk of evaluation work is done by a 
relatively small number of well-established and fairly 
large organizations. This hypothesized distribution of 
activities across types of organizations is confirmed by 
an (incomplete) inventory of competitive evaluation 
contract awards made in 1977 and 1979 (Kooi et al. 1978, 


lea CO similar conclusions: wnixe it laentitiea a large 
number of active organizations in the competitive 
procurement process, it found that awards for the 
unrestricted, open procurements most often went to very 
active bidders, usually large organizations. Since 1972, 
with increasing emphasis on open competitions, this trend 
has no doubt accelerated. 

As is shown in the next section of this paper, the 
major performers of evaluations have large professional 
staffs drawn from a wide range of disciplinary 
backgrounds. Less is known about the smaller 
organizations that perform the balance of federally 
funded evaluations; their activities and staffing 
patterns are largely undocumented since they have not 
become part of the professional and disciplinary networks 
in which the large organizations participate. 


Evaluation in Academic Institutions 

As was shown in Table B-2, evaluation clearly represented 
a smaller share of total RDD&E activities for academic 
organizations than for other performers. Furthermore, 
despite the fact that academic organizations are the 
largest performers of all education RDD&E, the dollar 
amounts involved in evaluation work were relatively 
small. It is not possible to ascertain from the ARROE 
data to what extent academic evaluation expenditures were 
funded with federal dollars obtained directly through a 
grant or contract from one of the education agencies in 
HEW or with federal dollars that had gone to a state or 
local agency that in turn contracted the evaluation to a 
college or university. 

When social-science-based evaluation was first used to 
assess social programs, academic institutions were 
frequent performers of major evaluations, usually under 
grant or sole-source contract arrangements. The reasons 
for a gradual shift from grants to contracts and from 
academic to other types of research performers have been 
amply discussed in a number of publications (see, e.g., 
Williams 1972), most recently by Levitan and Wurzburg 
(1979), who claim that by 1974 HEW had ruled out further 
support of evaluations under grants and that sole-source 
contracting became increasingly difficult. They report 
that by 1979 officials estimated that less than 10 
percent of HEW evaluation funds were awarded 


i-nat tney do not win many competitive awards: 

Kooi’s inventories of competitive procurements for 1977 
and 1979 showed only one study in each of the two years 
that could be unequivocally classified as an evaluation 
study competitively awarded to an academic institution. 

In their study of evaluation performers, Biderman and 
Sharp (1972) found that only 11 percent of the 1,324 
organizations identified as RFP recipients were 
academically affiliated institutions, and the majority of 
these had received the RFP at the agency’s initiative. A 
total of 225 bids were filed for 36 procurements; only 17 
of them were submitted by academically affiliated 
organizations; and only one award, not for an evaluation 
study, went to an academic organization. These earlier 
data suggested that academic organizations did not 
participate very actively in the federally organized 
competitive procurement system at that time, and this may 
not have changed a great deal since. 


Evaluation in Public Education Agencies 

Federal dollars are spent by state and local public 
education agencies primarily to perform evaluations that 
are mandated in conjunction with federally funded 
education activities. In addition, state or local 
agencies may carry out federally funded demonstration or 
research projects that have built-in evaluation 
components. State or local agencies can also participate 
in competitions for evaluation contracts; this is rare, 
however, since there are more restricted types of 
competitive procurements (for example, for various 
demonstration and innovative programs) that are targeted 
primarily to public education agencies and hence are 
preferred by them. 

As shown in Table B-6 above, evaluation occupies a 
more prominent place in the activities of local education 
agencies than in those of any other sector: more than 40 
percent of such agencies included in the ARROE study 
indicated that more than half of their research and 
research-related activities were devoted to evaluation. 
The resources of these education agencies are often 


LEAs: Los Angeles and Leon County, Florida. However, 
many of the evaluation activities undertaken by such 
agencies tend to rely heavily on student tests, so that 
the boundaries between "testing" and "evaluation" are 
often hard to draw. It may be for this reason, or 
perhaps because LEAs do not always identify sources of 
evaluation funding accurately, that LEAs appear to be 
somewhat less dependent on federal funds than are other 
public agencies to carry out their evaluation activities 
(see Table B-10). 

Evaluation—at least as defined for the ARROE 
study—plays a lesser part for state agencies than it 
does at the local level, but (as shown in Table B-3 
above) the actual amounts involved are larger because of 
the higher expenditure levels in these agencies. 
Relatively few state and intermediate service agencies 
spent more than 25 percent of their RDD&E resources on 
evaluation. 

According to the National Science Foundation (NSF) 
(1980), local personnel generally tend to perform most 

TABLE B-10 Percent of All Organizations Reporting That 
Half or More of Their Funds Came From Federal Sources 
in 1977 



Organization for Which 
Evaluation was a 

Major Activity 

Organization for Which 
Evaluation was Not a 
Major Activity 


Number 

Percent 

Number 

Percent 

Public 

SEA 

26 

73.1 

18 

50.0 

ISA 

37 

35.1 

23 

30.4 

LEA—large 

33 

24.2 

5 

60.0 

LEA—small 

84 

11.9 

28 

35.7 

Academic 

241 

36.9 

290 

39.3 

Private 

Major 

16 

68.6 

8 

62.5 

All other 

70 

52.9 

74 

62.2 


NOTE: Organization could check more than one "major activity" 
area. 

SOURCE: ARROE mail questionnaire respondents. 



portion performed extramurally has increased in recent 
years, from 20 percent in 1966 to close to 40 percent in 
1977. Of that 40 percent, private firms performed 17 
percent; not-for-profit firms, 13 percent; and 
universities and colleges, about 10 percent. The extent 
to which this pattern holds for education as compared 
with energy, environment, health, etc. cannot be 
ascertained from the NSF data. However, information from 
a recent survey of school districts (Lyon 1978) indicates 
that on the average only 6 percent of the budget of a 
district's evaluation units was spent on outside 
consultants, although there was considerable variation 
from district to district. State agencies, too, appear 
to perform most work in-house: one recent study reports 
that 73.3 percent of all research and research-related 
activities are conducted by agency staffs (Mathis and 
Walling 1979). 


PERSONNEL 

The organizations included in ARROE employed 
approximately 22,200 full-time and 12,000 part-time 
professionals in 1977. The distribution of personnel 
matches the distribution of funds, although in the 
aggregate, academic institutions allocate more persons 
per dollar than organizations in the other sectors (see 
Table B-ll). Staff qualifications vary by sector, with 
those in academic organizations most likely to hold a 


TABLE B-ll Staffing and Funding Allocation for Education 
RDD&E, by Sector, 1977 (in percentages) 


Sector 

Full-Time 

Professionals 

Part-Time 

Professionals 

Funding 

Private 

27 

16 

33 

Academic 

58 

76 

51 

Public 

15 

7 

16 

TOTAL (percent) 

100 

100 

100 

Number 

22,286 

12,024 

$735 million 3 

a Based on reports 

from 80 percent 

of respondents. 



SOURCE: ARROE mail and telephone respondents. 



wider spectrum of academic disciplines (see Table B-12). 

As was noted above, most organizations do not 
specialize in evaluation, and therefore staff is likely 
to be used interchangeably between evaluation and 
research. Insofar as the ARROE data allow 
differentiation, however, the following characteristics 
apply to those staff who actually worked on evaluation 
studies in 1977. First, the percentage of total staff 
allocated to all evaluation was slightly lower than the 
percentage of expenditures: 22 percent of funds and 17 
percent of personnel were devoted to evaluation and 
policy studies. This is not unexpected since the 
staff/dollar ratio for all RDD&E is highest in the 
academic sector and lowest in the private sector (see 
Table B-ll) and the private sector is the most frequent 
performer of evaluations. In the absence of data, one 
can only speculate about the reasons for the difference 
in staff/dollar ratio. It may be due to the greater 


TABLE B-12 Selected Characteristics of Full-Time Staff, 
by Sector, 1977 



Public 

Academic 

Private 

Percent of full-time staff 

with doctorates 

28 

67 

31 

Percent with major field 
of expertise in: 

Education 

65 

58 

41 

Psychology 

9 

10 

16 

Other social science 

3 

9 

12 

Humanities 

2 

2 

5 

Physical and biological sciences 

1 

7 

2 

Mathematics, statistics 

7 

2 

5 

Business economics, accounting, 

public administration 

3 

2 

5 

Communications, library science 

3 

3 

7 

Operation research, systems analysis 

4 

1 

4 

Other 

3 

6 

4 

SOURCE: ARROE mail respondents only; 

re sponse 

rate to 

this ques- 


tion was 40 percent. 
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be able to cover overhead or some personnel costs from 
regular budgets. The availability of low-cost labor 
(graduate students and post-doctoral fellows) on many 
campuses may also be reflected in these figures; the data 
in Table B-ll suggest that academic institutions are able 
to take advantage of the availability of faculty or 
students for part-time employment. However, the 
difference in staff/dollar ratios may also be due to the 
fact that private contractors and grantees spend higher 
proportions of their funds on nonpersonnel items such as 
computer work, which is often available at relatively low 
cost in university settings. Another factor may be high 
overhead costs in the private sector due, in part, to 
proposal writing or marketing costs that are especially 
high in that sector. 

Second, there are also some noteworthy differences 
with respect to staff training. Table B-13, which 
presents differences in the presence of doctorate holders 
on the staffs of reporting organizations, uses a 
different base from most of the other data shown in this 
paper. Organizations were categorized according to their 
answer to a.question about major activity areas, one of 
which was program and project evaluation. (Respondents 
were free to check as many areas as applied to their 
organizations, and most checked more than one.) 

Respondents were then classified into evaluators and 
nonevaluators based on their answers. 9 Again it is 
necessary to bear in mind that not all evaluation 
performers are in the "evaluator" category, but only 
those who indicated that evaluation was a major 
activity. Although in many cases the cell sizes are 
quite small, some comparisons can be made: in the 
academic sector, the participation in research and 
research-related activities of those who have Ph.D.s is 
ubiquitous. About three-fourths of all academic units 
performing this type of work employ Ph.D.s, whether they 
do evaluations or not. In most other types of 
organizations, there tends to be at least one person with 
a Ph.D. on the staff, but the number of Ph.D.s is greater 
if one of the major activities is evaluation work. The 
difference is especially striking in public agencies, but 
in the private sector, too, evaluation performers almost 
always have at least one person with a Ph.D. on the 
staff. Only in state agencies does the presence of 
evaluation activities not affect staff characteristics: 


TABLE B-13 Selected Characteristics of Organizations With and Without Evaluation 
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SOURCE: ARRQE mail respondents only. 



For-profit organizations are especially likely to employ 
Ph.D.s if they are engaged in evaluation. It should be 
noted, however, that the data in this category are from a 
small number of organizations. 

Equally interesting differences can be observed with 
respect to staff specialization, i.e., the presence of 
disciplinary specialists on an organization's staff. 

Table B-14 shows that organizations for which evaluation 
is a major activity tend to have more diversified 
staffs. This is especially the case in the private 
sector, but holds true in the other sectors as well. 

Obviously, staff size, percent of staff with Ph.D.s, 
and diversification of disciplines among the staff are 
not in themselves a guarantee of efficient or 
high-quality performance; in the aggregate, however, they 
furnish some indication of the efforts expended by those 
who carry out evaluation work within the educational 
research community. Generally, the performers of 
evaluation activities tend to be organizations with 
staffs that are larger, better trained, and more 
diversified than the staffs in organizations for which 
other types of research and research-related activities 
constitute a major activity. 


CONCLUSION AND COMMENT 

Despite the difficulties of distinguishing between those 
who perform evaluations and those who perform other types 
of educational research, and between those who are funded 
from federal sources and those who are not, some 
differences among performers emerge from the ARROE data. 
Of greatest interest are differences between academic and 
private-sector organizations, since they are the true 
outsiders who perform evaluations under federal 
auspices. The public agencies are important performers 
and their activities are of crucial importance in the 
assessment and evaluation of the impact of federal 
dollars spent on education, but the mechanisms at the 
disposal of the federal government in initiating and 
monitoring evaluations in the public sector are very 
different from those that apply to contracts and grants 
awarded to academic and private organizations. 

Furthermore, public evaluation units exist and function 
to a large extent in a self-contained universe, while the 


(in percentages) 



Education 

Psychology 

Math and 
Statistics 

Other 

Private profit 

Major evaluation 
performer 

100.0 

71.4 

35.7 

100.0 

Other 

71.4 

57.1 

50.0 

53.8 

Private other 

Major evaluation 
performer 

87.1 

42.6 

16.2 

66.7 

Other 

65.5 

29.3 ' 

5.2 

66.7 

Academic 

Major evaluation 
performer 

78.8 

37.6 

16.5 

52.0 

Other 

67.7 

30.3 

12.2 

46.5 

Small LEA 

Major evaluation 
performer 

85.9 

22.5 

19.2 

33.3 

Other 

88.9 

11.1 

11.1 

25.7 

Large LEA 

Major evaluation 
performer 

85.3 

55 .9 

51.5 

16.7 

Other 

83.3 

50.0 

23.3 

40.6 

ISA 

Major evaluation 
performer 

93.1 

34.5 

31.0 

33.3 

Other 

73.3 

13.3 

6.7 

55.2 

SEA 

Major evaluation 
performer 

96.3 

22.2 

33.3 

56.2 

Other 

94.1 

25.0 

22.2 

70.4 

All organizations 
Major evaluation 
performer 

84.5 

37.2 

73.8 

53.2 

Other 

64.3 

29.0 

12.4 

49.4 


SOURCE: ARROE mail respondents only. 


two other sectors compete, interact, and cooperate with 
respect to much of the evaluation work and related 
activities. 

It is clear from the ARROE data that academic units 
continue to do the bulk of educational research in 
general and that large numbers of well-qualified persons 
are involved in such activities. Universities have at 



utilization is often economical and advantageous. 
Therefore, it seems unfortunate that academic 
institutions participate so little in one of the most 
important segments of the work being done today in the 
field of educational research, namely evaluation. While 
private organizations can to some extent duplicate 
university staffing arrangements through the use of 
consultants, including academic consultants, this often 
requires travel, less opportunity for day-by-day 
involvement, and higher costs. Such arrangements also 
cannot provide the opportunity available at universities 
for faculty and graduate students to stay in close touch 
with practical problems and federal concerns and for 
better articulation between graduate training and 
employment requirements. 

But it is also worth noting that as a result of the 
shift to the private sector, a number of organizations 
have emerged that have large, sophisticated, 
multidisciplinary staffs that are very knowledgeable 
about the major educational issues of the day. Whether 
the present federal procurement system leads to the best 
possible utilization of these resources is not clear: 
earlier research (Biderman and Sharp 1972) and anecdotal 
evidence suggest that the timing of requests for 
proposals, the imposition of tight deadlines coupled with 
time-consuming clearance procedures, and the need to 
devote enormous efforts to proposal preparation all 
militate against optimal utilization. In any case, the 
maintenance of this capability is far from certain, given 
the reduction in the volume of federal evaluation 
procurements in education and the ability of many of the 
private-sector firms to redeploy personnel to areas such 
as energy, or transportation, or defense, which may be of 
higher priority than education. The loss of these 
specialists will be detrimental to the field of 
educational research, which has long suffered from a 
narrow and parochial perspective. 

As the report and other cited sources show, a 
convincing case can be made that the current procurement 
system is not designed for optimal efficiency. 
Increasingly, the choice of grants or contracts as a 
means of supporting work is not based on substantive 
considerations, and the eligibility criteria (based on 
such categorical descriptors as profit or not-for-profit, 
minority-owned, etc.) may preclude performance by 
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low federal personnel ceilings (Sharkansky 1980), but it 
needs to be made more flexible. The data presented in 
this paper suggest that most evaluation work in education 
commissioned at the national level is done by performers 
who have the experience and resources to perform it well/ 
despite occasional awards that are open to question (U.S. 
General Accounting Office 1980). But the universe of 
performers is a relatively narrow one. The 
diversification of this universe through greater 
participation by university-based research groups/ the 
preservation of existing proven resources in the private 
sector/ and improvements in the procurement system should 
be of concern to those who seek to increase the quality 
and utility of evaluations. 


NOTES 

1 This estimate is based on Abramson's data (1978), 
which showed for 1977 a total of only $63.6 million 
for all federally funded evaluations. While 
Abramson's definition of evaluation yields a much 
lower estimate of total evaluation activities than is 
generally used by other researchers/ this figure can 
be used to gauge the relative shares of expenditures 
by various government agencies. Of the $63.6 million/ 
HEW accounted for more than half/ with welfare 
agencies accounting for the largest bloc (more than 
$16 million) and education for the second largest 
(close to $14 million). 

2 Because of item nonresponse—especially with respect 
to funding questions—the actual numbers of cases 
available for analysis is usually somewhat smaller. 

3 Especially in academic institutions, it is not 
uncommon to have several separate, autonomous units 
(for example a school of education, a survey research 
unit, and the department of psychology) performing 
education research and research-related activities. 

Of the 1,268 academic organizations shown in Table 
B-2, the largest number (34 percent) were individual 
departments, followed by divisions or schools (24 
percent) and bureaus and centers (24 percent). 

4 The data files were examined for nonresponse bias and 
for mail versus telephone respondent bias, as well as 


tne variables available for this analysis (size or 
organization, sector, etc.)/ there were no obvious 
biases, but of course there is the always unanswered 
question about characteristics of reluctant 
respondents or nonrespondents that demographic 
variables do not capture. 

5 These data are based on the subset of mail 
respondents. Total expenditure data for all ARROE 
organizations showed the same ranking and order of 
magnitude, but slightly different 

percentages—academic 57 percent, private 33 percent, 
public 16 percent—suggesting that "active" public 
education agencies were more likely to return the mail 
questionnaires. 

6 As shown in Table B-8, this nomenclature includes a 
few government agencies other than public education 
agencies. 

7 The 10 private-sector organizations that reported 
expenditures of more than $5 million (in most cases 
for fiscal 1977) are Abt Associates, Inc., Education 
Commission of the States, Education Development 
Center, Inc., Education Finance Center, Educational 
Testing Service, Far West Laboratory for Educational 
Research and Development, Montefiore Hospital and 
Medical Center, Northwest Regional Education 
Laboratory, St. Louis Childrens Hospital, System 
Development Corporation. 

8 None of the contracts criticized on this basis in the 
GAO report were awarded by an education agency. 

9 I am indebted to Georgine Pion and Robert Boruch of 
Northwestern University for suggesting these 
tabulations and making funds available for the 
required computer work. 

10 But it should be kept in mind that ARROE encompasses a 
highly diverse set of organizations, including some 
that specialize in development and dissemination, for 
which these same characteristics may not be relevant 
to work performance. 
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APPENDIX 


c 


How the Evaluation System Works: 
The State and Local Levels 
Freda M. Holley 


Kaleidoscopic is a good term to describe the evaluation 
of federal programs at the local and state level. There 
is enormous variation both from state to state and from 
district to district. Moreover, the practice of 
evaluation differs across programs within those states 
and within those districts. 

This paper attempts to give some flavor of that 
variation in such areas as evaluation funding and 
budgets, personnel, evaluation activities and practices, 
and, finally, in dissemination and utilization. The 
paper concludes with some discussion of the implications 
of this variation. The reader is cautioned against a 
quick assumption that such variation is undesirable: it 
may well be that such variation is not helpful to those 
making decisions at the federal level, but it must be 
remembered that national program success can only be 
built block by block at the local level. Considerable 
variation may be necessary to foster program 
implementation and to respond to differing needs at the 
local level. Imagination may be required at the national 
level to use such variation creatively to the benefit of 
national purposes. It may also be necessary to recognize 
that it is pointless to attempt evaluation at the 
national level; one evaluation system cannot serve both 
the local, state, and national needs. In any case, the 
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vork to optimize the return from evaluation efforts at 
the local and state level, both for local and national 
aims. 


HOW ARE EVALUATIONS FUNDED? 

3ur best evidence on the extent of variation in federal 
program evaluation at the state education agency (SEA) 
and local education agency (LEA) levels is related to 
evaluation budgets. Budgets are a major concern in local 
and state evaluation efforts, of course, and for this 
reason most of the data collection has focused on them. 
The most recent data were collected in a survey of state 
and large city evaluation units on behalf of a task force 
on resource allocation in program evaluation appointed by 
Division H (School Evaluation and Program Development) of 
the American Educational Research Association (AERA). 

This survey (Drezek and Higgins 1980) reported that the 
size of LEA budgets for the evaluation of Title I 
programs ranged from zero to $935,000 for Title I program 
budgets of $104,000 to $52 million. Similarly, the range 
of median reported funding expressed as a percent of 
program funding across major programs ranged from 7 
percent for ESEA Title IVC (innovative practices and 
curriculum) to 0.5 percent for P.L. 94-145 (special 
education); see Table C-l for details. 

Doss (1979) surveyed large districts in the Southwest 
in order to gather descriptive information about their 
Title I evaluation efforts. This survey reveals similar 
variation: one program with a $3,563,071 budget had an 
evaluation budget of $10,000; another program with a 
budget of $2,447,020 had an evaluation budget of $88,036 
(see Table C-2). The percentages reported by Doss 
closely parallel those from a telephone survey reported 
by Boruch and Cordray (1980) . That survey, conducted as 
a part of their larger appraisal of federal program 
evaluation, indicated that in larger districts (defined 
as those with enrollments of 25,000 and above), 1.6 
percent of Title I allocations went to evaluation. 

Webster and Stufflebeam (1978) surveyed urban 
districts nationally to gather descriptive information 
about the practice of evaluation in large school 
systems. Although their data are not specific as to 
federal program source, the indication of the variation 
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District 

Total 

Title I 

Budget 

$ 

Title I 
Evaluation 
Budget 
$ 

Percent 

A 

no response 

75,000 

_ 

B 

2,660,923 

25,000 

0.9 

C 

4,311,745 

69,607 

1.6 

D 

4,188,526 

66,320 

1.6 

E 

12,277,805 

75,000 

0.6 

F 

3,374,458 

43,000 

1.3 

G 

9,450,000 

202,973 

2.2 

H 

4,500,000 

115,661 

2.6 

I 

3,563,071 

10,000 

0.3 

J 

2,975,878 

36,740 

1.2 

K 

2,447,020 

88,036 

3.6 

L 

5,485,432 

50,999 

0.9 

Mean 

5,021,351 

71,212 a 

1.4 

Median 

4,188,526 

66,320'S 

1.6 


a lncludes only those districts reporting both Title I and evalu¬ 
ation expenditures. 

SOURCE: Doss (1979). 


in the amount of federal funds available for evaluation 
also parallels the findings from the later studies (see 
Table C-3). As Table C-3 shows, federal funds constitute 
a considerable portion of most school district evaluation 
resources. This is somewhat at odds with the finding in 
Lyon and Doscher (1979) that the funding sources for the 
average evaluation office is 65 percent local, 18 percent 
federal, 15 percent state, and 1 percent other. This 
discrepancy may be related to urban differences and to 
whether flow-through monies are treated as state or 
federal resources. 

The ranges of funding are as great as they are 
primarily because of the way in which evaluation funding 
is secured and secondarily because of differences in 
evaluation requirements across federal programs and 
across state agencies. One way to illustrate the 
situation is to describe how funds for evaluation of 
three different federal programs are typically secured 
using the experience of one district as a focal point of 
the description. The district is the Austin Independent 
School District, Austin, Texas. Although procedures are 
not exactly the same in other districts, there is 
considerable similarity. 
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lifferent departments. 

:bster and Stufflebeam (1978) . 





Title I evaluation is the largest federal program 
activity in the Austin Independent School Ditrict (Austin 
ISD) as it typically is in all SEA and LEA evaluation 
units. LEA funds for evaluation are secured as a part of 
an application to the SEA. The evaluation is developed 
by the Austin ISD as one component of an overall Title I 
program. The component sets out the scope of work to be 
performed, identifies the personnel to carry it out, and 
develops a budget for the activity. The amount of the 
budget for the evaluation component is initially 
established by the district on the basis of a district 
policy statement that ties evaluation funding to program 
size on a sliding-scale guideline. (This approach is not 
typical since most agencies lack such a policy 
statement.) what goes into the Title I application for 
evaluation is generally affected by the attitude of the 
LEAs toward evaluation, the way in which the application 
content is controlled within the LEA, the evaluation 
capability of the LEA, and in turn, by all those same 
factors at the SEA level. In the Austin ISD, the 
development of applications is watched rather closely by 
both the school board and by the top district 
management. Moreover, the staff of the department 
handling federal program fund applications is favorable 
toward research. In Austin at one time, and in many 
districts today, the application content could be almost 
entirely controlled by the application writer. When this 
is true and when the writer is not favorable toward 
evaluation, it can have considerable impact on the 
evaluation capability. 

Once developed, the application is negotiated by the 
district program officer with the SEA program officer. 

The entire application is generally under the supervision 
of one SEA consultant; the SEA evaluation unit will 
almost never be involved in the review or negotiation of 
the application. Similarly, the district evaluation 
staff will typically not be involved in the negotiation. 
The SEA program officer is very unlikely to have seen the 
district evaluation report from the previous year and may 
well have little appreciation for the cost of 
evaluation. Since the LEA program officer will likely 
negotiate with the SEA program officer, the former's 
willingness to support the evaluation budget will be 
crucial at this point. When this kind of situation 
exists, of course, the positive or negative nature of the 
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In summary, the Title I evaluation budget at the local 
level may be influenced by a number of political factors 
many of which will not favor rigorous evaluation and 
reporting. A better model would provide for involvement 
of the SEA evaluation staff throughout the application 
and approval process. Not only would evaluation 
activities get less one-sided consideration, but—more 
important—evaluation staff could introduce improvements 
into the program plans based on the results of completed 
studies. 


Emergency School Aid Act (ESAA) 

ESAA programs have been another source of considerable 
evaluation funding in the past, particularly for urban 
school districts. When the initial guidelines for 
application were issued, they were in many ways model 
guidelines for the development of high-quality 
educational proposals and programs. They set up criteria 
for scoring proposals that were based on a number of 
aspects of the program including the objectives and the 
evaluation. The forms were laid out so that the 
activities and evaluation should flow from the 
objectives. It has been interesting to watch what has 
happened to the actual awarding of grants in view of that 
model. 

The Austin ISD annually goes through an elaborate 
process of proposal development that involves community 
hearings, working with an advisory group, and extensive 
staff involvement. The product of such extensive 
political input is usually a huge, uncontrolled set of 
small fragmented components, one of which is evaluation. 
In the Austin ISD the resulting product usually involves 
every school campus, some community outreach, and various 
disciplines from counseling to remedial reading. Even 
under normal resource constraints, an evaluator would 
stand in awe of trying to develop accountability measures 
for implementation and achievement of objectives. There 
are, however, some additional resource constraints that 
have at times made the task out of the question; they are 
discussed below. 

After the proposals are put in final form by the LEA, 
they are reviewed by SEA representatives and submitted to 
the federal level. Until 1979, proposals were submitted 



brings reader panels in to review the proposals. These 
readers try to apply the criteria set up in the ESAA 
application process to the proposals. These readers are 
often ESAA program officers from other LEAs and from 
SEAs. Again, these readers are unlikely to have any 
knowledge of evaluation. Neither readers nor program 
officers often understand the sophisticated set of 
criteria originally established for ESAA. For example, 
the original guidelines called for awarding points on the 
basis of well-developed objectives. Specific percentages 
were mentioned as desirable. At least regionally this 
was eventually interpreted as "the more percentages the 
better." This eventually led to such meaningless 
objectives as "10% of a 10% sample of high school 
students will score 75% on a measure of involvement "l 
Our office was told at one point that a comparison based 
on a significantly higher performance of a program group 
over a control group was unacceptable. 

In the early 1970s, the Austin ISD did try meaningful 
evaluation in ESAA programs several times. We had 
budgets of as much as $84,000 for a program with a budget 
of $840,000 for the ESAA bilingual component. (At one 
time, Austin had three large ESAA programs: basic, 


pilot, and bilingual, so that the annual ESAA program 
budgets totalled almost $2 million.) More recently, as 
the impact of Austin's last court order on desegregation 
declined, funding declined as well, and evaluation 
budgets fell more drastically than program budgets. 

Thus, for the last three years, the evaluation/program 
budgets for ESAA basic (the only component remaining in 
Austin by last year) have been respectively: $3,000 and 
$163,970 in 1979; $12,000 and $414,255 in 1978; and 
$5,400 and $488,900 in 1977. The drastic decline in the 
evaluation budget from the early years to 1977 was due to 
a regional, or perhaps national, interpretation of the 
legislation that a set-aside of 1 percent for national 
evaluation was a limit on local evaluation as well. Of 
course, there is a considerable difference between what 
can be done with 1 percent nationally and what can be 


done with 1 percent of a small local budget. Any true 
evaluation of local ESAA became impossible even when that 
evaluation was merely the mandated measurement of 
objectives set out by the SEA. Such objectives had to be 
carefully written around what could be measured by using 
existing district data, whether they had a strong 




However, by that time we had learned that ESAA grants 
were generally going to be funded late and that, 
consequently, program implementation would lag badly. We 
could predict that results from the program would not be 
significant. In addition, for some reason, Austin has 
consistently been placed on hold by the Office of Civil 
Rights for the receipt of ESAA funds, and programs do not 
begin until after school begins—too late for hiring good 
staff or developing good programs. ESAA seems to this 
writer a model for how not to do federal programming. 


ESEA Title VII Bilingual Education 

A third type of evaluation experience came under ESEA 
Title VII bilingual education. For this grant the Austin 
ISD submitted a 5-year proposal directly to the Office of 
Education in the spring of 1976. It had been initially 
reviewed by the Texas Education Agency. Although it is 
customary for Title VII to require third-party 
evaluation, the Office of Education program officer 
working with Austin at that time was uniquely interested 
in true research and was convinced-that the 
organizational placement of the Austin ISD's Office of 
Research and Evaluation, reporting directly to the 
superintendent and the board, did indeed make its program 
independent. The officer believed that it could function 
within the district and with the Office of Education as a 
third-party evaluator and that it could produce work of 
value to bilingual evaluation in a special way. This 
5-year grant has permitted a longitudinal evaluation of 
the district's bilingual education effort that has 
provided distinctive information and has had a real 
influence on the conduct of the bilingual program in the 
school system. It constitutes one of the few 
longitudinal evaluations of bilingual program students in 
the country; the findings have been disseminated through 
a national conference held in August 1980 with the joint 
support of the National Institute of Education, the Texas 
Education Agency, the Austin ISD, and a number of other 
agencies. 

The budgets during those years have been adequate to 
permit a fairly high-quality evaluation that focused in 
its early years on implementation and process evaluation 
and later on the longitudinal outcomes. The first-year 



udget was $60,094 with a program budget of $563,000. 


Summary 

ederal program evaluations are secured by LEAs through 
pplications to one of three agencies: SEAs, regional 
ffices of the Department of Education, or the Washington 
ffice of the Department. The LEA application to the SEA 
s typical of Title I, Title I migrant, and Title IV of 
SEA; of certain vocational programs; and of certain 
ecial education programs. Generally, these grants are 
flow-through" monies: that is, funds are allocated to 
tates based on such factors as census information about 
he number of low-income students in a state. In some 
ases, the state in turn allocates set funds to districts 
ased upon similar census information. In other cases, 
uch as with Title IVC for innovative programming, funds 
re allocated at the state level on a competitive award 
asis. ESAA grants have come through the regional office 
n the past and more recently through Washington. The 
SAA Title VII bilingual grant is typical of awards 
ecured directly from Washington. These are generally 
ompetitive although there is little doubt that political 
actors weigh heavily in the decisions. For example, the 
ize and importance of bilingual populations within a 
tate and city seem to be important factors in decisions 
n Title VII. 

Methods and sources of funding are constantly changing 
t every level, as indicated by the shift in ESAA funds 
rom the regional office to Washington. Other funds may 
s shifted from Washington to the SEA. Each such change 
suits in changes in the procedures for securing funds, 
are is the evaluation office in which staff remains 
ufficiently aware of these changes and of new sources of 
jnds to be sure that all the available resources for 
valuation are tapped. 

At the SEA level, funding for evaluation is typically 
portion of the funds set aside for administrative 
asts. This arrangement tends to pit the evaluation unit 
: the SEA level against the program administration for 
^sources. The SEA policy on evaluation may well be the 
itermining factor in how much is allocated to 
/aluation. Some states, particularly large ones such as 
»xas, will also have regional units or service centers. 



small districts on a contracted basis. In some cases 
they compete with LEAs for grants, such as Title IVC, and 
their evaluation activities on those grants will parallel 
those of the LEAs; their evaluation reports will be 
provided to the central SEA just as those by the LEAs. 

Regardless of the source of the funds, it should be 
clear that the size and content of the evaluation 
components of all programs are much influenced by program 
officers at local, federal, and state levels. In the 
Drezek and Higgins (1980) survey, only 21 percent of 
state and local evaluation units reported that evaluation 
costs were allocated on the basis of a fixed percentage 
(see Table C-4) . therefore, it is important to note that 
the control of the ^budget by program officials is likely 
to have a real impact on the content and potential 
credibility of evaluations. 


WHO DOES EVALUATIONS? 

In most states certification standards are applied to 
personnel in federal programs. For example, a counselor, 
administrator, or supervisor must be certified to fill 
those roles in Texas. In general, evaluators are not 
certified and no standards are applied to the personnel 
filling the role of evaluator. In some LEAs and SEAs, 
the federal program director or coordinator may bear full 
responsibility for evaluation, and even in agencies with 
substantial evaluation units, small federal evaluations 
may be done by program staff. Typically, when program 
staff are given the responsibility for evaluation, they 
will have neither training nor experience in evaluation 
methodology, measurement, or statistical analysis. The 
author has observed many small school districts in which 
the person charged with Title I program evaluation is a 
reading teacher brought directly from the classroom, not 
only with no training in evaluation, but also with a weak 
background in mathematics. 

By contrast, in some states and for some programs, 
third-party or contracted evaluations are the rule. The 
qualifications of the personnel in the contracting 
agencies will generally vary as much as those of the 
staff in the LEAs. In addition, although third-party 
evaluations are supposed to ensure a lack of bias, the 
contractor sometimes has an eye on future contracts and 


idget in Each Type of Agency 


thod 

Smaller LEAs 
(number = 28) 
percent using method 

Larger LEAs 
(number = 24) 
percent using method 

roughly fixed per- 
ntage of program 
sts is used. 

25 

21 

amount is deter- 
ned by the scope 
evaluation work. 

54 

58 

much as possible, 
nee sufficient 
ount is seldom 
ceived. 

25 

4 

ner method, 
amples included 

LI three of the 

Dve," "no fixed 

Le," need to con¬ 
fer salary levels 
available staff. 

21 

21 


PE: Some respondents indicated using more than one method. 

2 number of people who indicated that they used a particular 
-hod was usually slightly larger than the number who went on 
report the actual percentage, or range of percentages, used. 
JRCE: Drezek et al . (1980). 

f well be gentler in approach than internal evaluators 
o are permanent staff. 

Finally, in many districts and particularly in the 
:ge urban systems, well-trained and sophisticated 
aluators with doctorates in research and evaluation 
:ry out evaluation tasks. Within those districts 
iring research and evaluation units with such staff, 
iluator competencies are reported to be at a fairly 
3h level in most traditional evaluation and statistical 
;as. In the Webster and Stufflebeam survey (1978), for 
ample, competencies in areas such as multivariate 
rerential statistics, measurement theory, and 
perimental design were estimated by departments to be 
>ut 3.5 on a scale of 1 to 4 where 4 is "advanced 
npetency." In newer methodologies such as bayesian 


where 1 is "no familiarity." 

Despite the rather optimistic estimate of the 
competencies existing in the larger evaluation units, the 
author feels that even in this area there are 
considerable problems both in preservice preparation of 
evaluation personnel and in-service training for current 
staff. These problems deserve serious consideration. 


Preservice Evaluator Training 

The competencies required in evaluation are many and 
varied. Boruch and Cordray (1980, Ch. 4:1) point out the 
misconception that any one evaluator ever could or should 
have "all the skills necessary for any evaluation 
effort." It is thus obvious that any evaluator training 
program has to involve choices among the many types of 
skills that evaluators may eventually need. The training 
that most applicants have evidenced to the author falls 
short of the minimum requirements needed for a public 
school evaluation office in three fundamental ways. The 
applicants lack the degree of statistical and computer 
programming skills needed; they do not have the 
certification required by many public schools; and they 
do not have adequate preparation for dealing with the 
organizational and political context of the public 
schools. Over the years the author has found that it is 
possible to help bright candidates pick up the latter 
skills and even to provide rather quickly a necessary 
understanding of the evaluation task as opposed to the 
research task, but the minimum statistical and computer 
skills are an absolute entry necessity. Many of the 
current "evaluation training" programs focus on 
evaluation theory, but fail to provide adequate training 
in the fundamental skills. Even though many school 
systems do make it possible to hire evaluation staff 
without teacher or administrator certification, few will 
permit the evaluator without those credentials to move to 
administrative positions in the evaluation office. Many 
evaluators do not even realize that such credentials are 
needed although in many cases it might have been 
relatively easy for them to pick up such certification as 
a part of their graduate programs. 

There are a number of steps that might lead to better 
preservice training that could be taken by the Department 
of Education or Congress. For example: 


receiving federal support might be required to involve 
in-service evaluators; 

• Federal support might be given to graduate 
training programs that contain provisions for field 
experience and internships in an LEA or SEA; 

• Field experiences in an LEA or SEA could be 
offered early in a training sequence, thus providing 
exposure to requirements in those settings; 

• Support might be given to interchanges between 
university and SEA or LEA evaluation staff of one or two 
semester lengths so that university programs do not 
become too insular. 


In-Service Evaluator Training 

Since a preservice program cannot possibly give an 
evaluator all the skills that will eventually be needed 
and since many practicing evaluators do not presently 
have even the minimum skills, better in-service training 
opportunities for evaluators are desperately needed. 

Many conditions limit practicing evaluators from 
maintaining and increasing their skills at the present 
time. Public school evaluation is an all-consuming 
role. An evaluator works 12 months, with summer bringing 
the heaviest work load; because resources are often 
inadequate, the workday and workweek are far longer than 
those of the average worker. Therefore, once an 
evaluator is on the job, there simply is not sufficient 
time available to renew or enhance skills. Turnover of 
evaluation staff is high: the Austin ISD loses 25 
percent of its evaluation staff (15 senior and 20 junior 
professionals) every year. Perhaps there is such high 
attrition not only because of the time demands but also 
because evaluation is an emotionally difficult field. 

The constant negotiations necessary have been described 
in several chapters of this report, but inevitably, many 
practicing educators fear and dislike evaluation and 
resent the power that comes with evaluation information. 
The evaluator must deal with those negative feelings on a 
daily basis. At the same time, the professional rewards 
for an evaluator in an LEA or SEA are few. The social 
science research community tends not to esteem evaluation 
work very highly, and evaluation specialists in 
universities give limited recognition to work carried on 
elsewhere. Thus, there is little in an evaluator's 



not co mention participating in additional training ir it 
were available. In fact, however, additional in-service 
training is really not even available. There are such 
things as AERA presessions, and the Austin ISD staff 
regularly participate in those. There are a few 
week-long university sessions offered during the summer, 
but summer is the busiest time of the year for an 
evaluator. (The only time with any slack at all in the 
Austin schedule is November, December, and January.) And 
when the evaluator does participate in any of these 
activities, they tend to be piecemeal and disjointed. 

In the face of such a grim diagnosis, are there things 
that could be done to improve in-service learning 
opportunities for evaluators? Yes, but most of those 
things will be very expensive, such as: 

• Post-doctoral residential programs in which 
evaluators return to university training for a semester 
or two; 

• The exchange programs between university and 
LEA-SEA staff mentioned above would be beneficial to the 
evaluator as well as the university programs; 

• Special project assignments at the federal level 
with built-in training by resident staff;. 

• Special training sessions planned and offered on 
a sequential basis at times favorable to LEA and SEA 
evaluation schedules; 

• Visiting scholar programs such as those already 
being offered on a limited basis by the Center for the 
Study of Evaluation. 

In addition to such formal efforts, however, much can 
be done on an informal basis to encourage an evaluator's 
professionalism and to provide incentives for learning. 
The author has received enormous benefits in that sense 
from the network membership established through Division 
H of AERA and the Directors of Research and Evaluation. 
The evaluation report awards given annually by Division H 
were created to provide recognition for evaluation work. 
The new Journal of Educational Evaluation and Policy 
Analysis may provide a publication forum for evaluators. 
Recently, the Title I technical assistance center for the 
region serving Texas has brought together the Title I 
evaluators from large cities to form a network 
relationship for this region. Such networks could be of 
considerable help in increasing the professionalism of 
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WHAT HAPPENS IN EVALUATIONS? 

Compliance activities probably predominate in the 
majority of federal program evaluations at both the SEA 
and LEA levels. In many SEAs this may be almost the sole 
preoccupation. They will design annual report documents 
to gather information from LEAs, gather such information, 
and provide it in turn to federal offices. They are 
likely also to conduct or participate in monitoring 
visits to LEAs to check fiscal and program plan 
compliance. Only a few states currently attempt more 
substantive studies designed to influence state plans for 
the use of federal program funds or to evaluate the 
effectiveness of program activities, although the 
activities in several states are noteworthy. 

At the local level, the first priority activities for 
the evaluation unit also may well be data collection 
relative to compliance. For example, one of the largest 
aspects of Title I evaluation may be the collection of 
data on low-income enrollments by campus, the 
identification of students eligible for service based on 
low achievement, and locating students in nonpublic 
schools or who have dropped out. Until the advent of the 
Title I models, much of the reporting involved little if 
any analysis. Similar activities and numbers are 
fundamental in most federal program evaluation efforts. 

After these compliance or record-keeping types of 
activities, the measurement of performance relative to 
set objectives is probably the next most typical 
evaluation activity. Great variety exists across 
programs in the type of objectives established. I have 
already touched on those used in ESAA programs; other 
types may range from achievement outcome objectives to 
service objectives based on the number of participants 
served. The survey of Title I programs in the Southwest 
mentioned earlier (Doss 1979) yielded information that 
demonstrates both the nature of Title I objectives in 
reading and a feel for the variety of test instruments 
used. (Some representative samples are shown in Table 
C—5.) Boruch and Cordray (1980, Ch. 5:11-12) have 
appropriately criticized such objectives as arbitrary and 
insufficient as standards for evaluation. After far too 
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planning, but they are a poor tool for evaluation. 

Only in a few instances are substantive, long-range, 
or cumulative effects of federal programs examined. As 
we in Austin ISD have struggled with federal program 
evaluation over the years, we have become convinced that 
such evaluation produces the best information and leads 
to the best utilization. 

An interesting trend in the last few years has been 
toward what have been called "interpretive analyses," 
such as: Impact of Title I: A Decade of Progress (Moore 
and Turner 1976); Limitations of a Standard Perspective 
on Program Evaluation: The Example of Ten Years of 
Teacher Corps Evaluation (Pox 1977); Evaluation in the 
Seventies: What We Have Learned About Program 
Development and Evaluation (Holley 1977). These reports 
try to bring together information gained from discrete 
evaluation efforts either across years or across programs. 


HOW ARE EVALUATIONS REPORTED? 

Evaluations are reported in a number of ways, both formal 
and informal. There is probably less uniformity from 
district to district in reporting than in either 
budgeting or in activities. Again, it may be 
illustrative to use the Austin ISD procedures as the 
center of this discussion of reporting. ESEA Title I 
involves the most elaborate reporting and is therefore 
used as the example. The flow of information is charted 
in Figure C-l. 

The school year in Austin runs from July 1 each year 
to June 30 the following year. Austin's major reports 
come at the end of the year and the month of June is a 
hectic, full month of analysis, interpretation, and 
report writing. As for all Austin ISD evaluation 
projects, the Title I evaluation staff prepare a final 
technical report and a 15-page final report. The 
technical report consists of appendices covering each 
data collection effort. It is long and voluminous; only 
a few copies are produced. The 15-page report goes into 
a book called Findings Volume . The short report is the 
major communication vehicle about the project. It covers 
the essential results first, then describes the project 
and the evaluation and provides some discussion of the 
results. This short report evolved from our growing 


FIGURE C-l Evaluation reporting for ESEA Title I 
Austin, Texas. 
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reaa at an. in addition, Title 1 start mast complete an 
AIR report—an annual information report—to the SEA. 

This is a form containing numbers, analysis of the 
achievements of various components, and a space to 
indicate changes to be made as a result of the evaluation 
data. The Texas Education Agency has put considerable 
effort into improving this reporting form over the years 
in an attempt to encourage good evaluation and 
utilization. 

The AIR report is signed by the superintendent and 
submitted to the Texas Education Agency. It is not 
reviewed by the school board primarily because the board 
will receive the Findings Volume , which contains the same 
results but in the usual district format. The format is 
of concern because, given the limited time available for 
the presentation and discussion of evaluation results, it 
is important not to have to expend time or effort to 
explain differing formats. Soon after June 30, which is 
the annual deadline for the completion of final 
reporting, a session with the school board to review all 
results is held. Thereafter, all reports become public 
information and freely available. Copies of both the 
technical and final reports are placed in the board 
office, the district's professional library, and the 
Office of Research and Evaluation. Presentations of the 
results are then arranged early in the school year for 
principals, instructional staff, and various other 
groups. All of these formal presentations, however, are 
not nearly so important as the informal discussions that 
subsequently occur. Knowledge of important findings 
relevant to a specific instructional supervisor or 
administrator may be shared over coffee or lunch. In 
particular, findings may be reviewed during planning 
sessions for particular programs or activities. 

A follow-up reporting activity for the past few years 
has been a short brochure summarizing Title I results for 
teachers and parents. Results are also mentioned in 
newsletters. 

Another critical reporting period for Title I comes 
during the early part of the calendar year. It is the 
needs assessment for the preparation of the next year's 
program plan. This assessment reports data about where 
students will be and what achievement levels are. From 
this report. Title I schools for the following year will 
be designated and cut-offs for eligibility will be 
established. The report is mainly for in-district use. 
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WHAT IS THE IMPACT AND USE OF EVALUATION? 

Given the picture described above, it would hardly be 
rprising if the impact and use of evaluation at the 
tate, regional, or local level were difficult to trace 
r document even if we had good procedures for doing so. 
uch of the current literature on utilization seems to 
include that utilization does occur, but that it takes 
iverse and difficult-to-trace routes. This writer's 
abjective observations concur with that conclusion. As 
program officer from another Texas district told a 
coup recently, prior to the advent of federal programs 
ou could walk into a school and ask how well the 
cudents were performing and never get anything but 
abjective answers. Now schools all over the state know 
recise levels at which students, schools, and districts 
ce performing. Sometimes they can even tell you why the 
ivels are what they are. Because federal programs are 
aw so pervasive, we often fail to recognize just how 
:eat their impact on the conduct of schooling has been. 

: has been clearly demonstrated in Texas that where 
r aluation produces useful results, they do get used in 
rogram design. Eventually. 

This is not to say that impact and utilization are 
lat one would wish. It is of major concern to this 
iter that the effects of evaluation are only a fraction 
: what they might have been if the resources that have 
sen available had been more carefully guided and 
irgeted. However, evaluation has been an innovation and 
! are only now learning many of the things we needed to 
iow about its implementation. 


is one of the most serious handicaps to extensive use. 

It has been a particular idea of this writer that on 
programs such as Title I or Title vil, for which we are 
expending rather large sums in local evaluations, we 
might find better ways to capitalize on that evaluation 
effort. If evaluations of compensatory programs were 
coordinated in even a minimal way, how much richer our 
evaluations might have been. For example, teachers's 
aides and other instructional aides are commonly used in 
various compensatory programs, yet, their effectiveness 
has been examined only in an incidental way in a few 
evaluations. What many of us have found in those 
examinations has, however, been disturbing. The data are 
not complete enough for conclusive statements about the 
effectiveness of aides; it might have been if a larger 
number of school districts had examined how aides were 
being used and what the effects were. The use of time is 
another important factor that affects outcomes that some 
of us have stumbled on in our evaluations. Again, data 
across a large number of districts collected through 
careful observation studies would be far better than 
estimated numbers on every child in Title I filled in 
capriciously from district to district. What are some of 
the ways such an idea might be accomplished? A number of 
ways can be imagined, varying from fairly indirect to 
direct and controlled. 

In Texas, for example, a number of urban districts 
have regular meetings of their superintendents, 
curriculum staff, and evaluators. These meetings have 
led to the sharing of information among each group. The 
meetings of the evaluation group, the Joint Urban 
Evaluation Council, has resulted in similar studies on 
several topics in the different cities. Measures and 
reports have been exchanged. Support for the national 
directors of research and evaluation (DRE) group, which 
now meets annually for one day prior to the AERA meeting, 
to have more frequent meetings might have similar results 
at the national level. Such a forum could be used for 
the Department of Education to present a set of critical 
issues in compensatory education and possible alternative 
evaluation designs to address these. 

The Title I technical assistance centers (TACs) might 
also be given the task of the informal encouragement of 
such efforts as they work with school districts, in 
informal discussions with one TAC center evaluator, I 



would contribute in the same sense as the regular DRE 
meetings would be that of bringing the Title I evaluators 
together on a regional basis. Although mentioned already 
as a route to improved in-service training for 
evaluators, it could also be a stimulus to shared designs. 

The fundamental lack of important evaluation 
information that could contribute to improved programs 
and failure to coordinate information that does exist are 
not the only handicaps to utilization, however. There 
are other factors. First, federal programs in general 
tend not to be of high concern to most local school 
boards and administrators. This can be interpreted more 
as a matter of time available and priority than as a lack 
of interest (Holley 1980). The federal funds in the 
Austin ISD, for example, are currently about $5 million, 
but this is only a fraction of the total district 
operating budget of well over $100 million. While this 
ratio is smaller than for many districts, it is still 
fairly representative. Austin has had far better 
attention to federal programs and their evaluation since 
the Board of Trustees adopted as -one of its top 
priorities to improve the achievement of low 
socioeconomic and minority students. The board adopted 
this priority based on evidence of the enormous deficit 
in the achievement of those students relative to the 
total student body and because they represent a growing 
proportion of the student body. With this general 
priority for these students in the district, federal and 
state compensatory programs come into focus as one of the 
major resources for achieving district priorities. The 
Department of Education may find that strong federal 
program evaluation coincides with strong district 
evaluation. 

Another obstacle to the use of federal program 
evaluation information is the lack of recognition of 
dissemination needs. Typically, an evaluation is 
coterminus with a program grant. For example, when the 
Austin ISD recently applied for a 2-month extension of 
its 5-year study of the Title VII bilingual program in 
order to provide for more extensive dissemination, the 
request was denied despite the fact that no new monies 
were requested. Had our office not felt the evaluation 
results were so important that we devoted nonfederal 
resources to dissemination efforts and continue to do so, 
much of the value of an important evaluation study would 
have been lost. Such constraints mean in many cases that 
no dissemination of findings ever occurs. 



consistently in evaluation realizes that the time 
available for. communication of evaluation results is 
never adequate. In a large district with many competing 
communication needs and with many evaluations, this is a 
severe problem. Efficient evaluation units develop 
communication strategies that permit the telescoping of 
information through shorthand forms for reporting. Since 
the data that will have impact at one level of the system 
are not the same as those that will have impact at 
another, the information has to be transmuted innumerable 
times before dissemination is accomplished. Resource 
needs for this effort may well not be recognized. Thus, 
the improvement of utilization must come both through 
better evaluations that produce more useful information 
and through better dissemination and promotion techniques 
on the part of the evaluation staff. Both efforts need 
better recognition and better support from Washington. 


CONCLUSION 

Variation is the theme around which this paper is 
written, and surely that theme has been demonstrated. 
Complexity of relationships may have emerged as a major 
subtheme, however. Figure C-2 lays out some of the 
funding, reporting, and advisory relationships as they 
appear from the experience of the author. Each year the 
complexity seems to increase with a concurrent decrease 
in the flexibility available to the LEA. 

Every increase in complexity has tended to bring 
additional reporting demands to the LEA. Ultimately, the 
bulk of that reporting burden falls on students, 
teachers, and principals. To the extent that such 
reporting has moved beyond their central concerns, it 
becomes meaningless bureaucracy. This in turn has two 
serious side effects. There will be an increased dislike 
and disrespect for "evaluation," and there will be a 
decreased willingness to hear and utilize evaluation 
results. 

Both Congress and the Department of Education would be 
wise to consider such effects in designing national 
evaluation requirements and systems, ultimately, the 
most successful evaluation of federal programs will be 
that which leads to programs that are winners—winners 
for both students and staff. 
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Education, U.S. Department of Health, Education, and 
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JOHN GABUSI, Assistant Secretary for Management, U.S. 
Department of Education 

EDWARD B. GLASSMAN, Office for Evaluation and Management, 
U.S. Department of Education 
WILLIAM A. HIGHTOWER, Human Resources Division, U.S. 
General Accounting Office 

HOWARD F. HJELM, Director, Division of Research and 
Demonstration, Office of Vocational and Adult 
Education, U.S. Department of Education 
BOBBY R. HOOVER, Human Resources Division, U.S. General 
Accounting Office 

SAMUEL W. HUNT, Staff, Appropriations Committee, U.S. 
Senate 

♦Affiliations of individuals at time of interviews 
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JOSEPH S. WHOLEY, former Deputy Assistant Secretary for 
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ROSEMARY C. WILSON, Director, Division of Follow-Through, 
Office of Elementary and Secondary Education, U.S. 
Department of Education 

THOMAS R. WOLANIN, Staff Director, Subcommittee on 
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